mirror of
https://github.com/ccfos/nightingale.git
synced 2026-03-03 06:29:16 +00:00
Compare commits
188 Commits
default-te
...
stable
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
df9ba52e71 | ||
|
|
ba6d2b664d | ||
|
|
2ddcf507f9 | ||
|
|
17cc588a9d | ||
|
|
e9c7eef546 | ||
|
|
7451ad2e23 | ||
|
|
31b3434e87 | ||
|
|
2576a0f815 | ||
|
|
0ac4bc7421 | ||
|
|
95e6ea98f4 | ||
|
|
dc60c74c0d | ||
|
|
a15adc196d | ||
|
|
f89ef04e85 | ||
|
|
f55cd9b32e | ||
|
|
305a898f8b | ||
|
|
60c31d8eb2 | ||
|
|
7da49a8c68 | ||
|
|
65b1410b09 | ||
|
|
3901671c0e | ||
|
|
9c02937e81 | ||
|
|
0a255ee33a | ||
|
|
8dc198b4b1 | ||
|
|
9696f63a71 | ||
|
|
03f56f73b4 | ||
|
|
7b415c91af | ||
|
|
2abf089444 | ||
|
|
e504dab359 | ||
|
|
989ed62e8d | ||
|
|
b7197d10eb | ||
|
|
f4de256388 | ||
|
|
3f5126923f | ||
|
|
5d3e70bc4c | ||
|
|
bb2c5202ad | ||
|
|
3acf3d7bf9 | ||
|
|
a79810b15d | ||
|
|
f61cb532f8 | ||
|
|
34a5a752f4 | ||
|
|
9be3deeebd | ||
|
|
2ceed84120 | ||
|
|
8fbe257090 | ||
|
|
ae35d780c6 | ||
|
|
4d2cdfce53 | ||
|
|
a0e4d0d46e | ||
|
|
dd07d04e2f | ||
|
|
61203e8b75 | ||
|
|
f24bc53c94 | ||
|
|
ef6abe3fdc | ||
|
|
461361d3d0 | ||
|
|
52b3afbd97 | ||
|
|
652439bb85 | ||
|
|
6f0c13d4e7 | ||
|
|
c9f46bad02 | ||
|
|
75146f3626 | ||
|
|
50aafbd73d | ||
|
|
b975cb3c9d | ||
|
|
11deb4ba26 | ||
|
|
ec927297d6 | ||
|
|
f476d7cd63 | ||
|
|
410f3bbceb | ||
|
|
2ad53d6862 | ||
|
|
fc392e4af1 | ||
|
|
9c83c7881a | ||
|
|
f1259d1dff | ||
|
|
d9d59b3205 | ||
|
|
d11cfb0278 | ||
|
|
5adcfc6eaa | ||
|
|
037152ad72 | ||
|
|
2de304d4f2 | ||
|
|
03c56d048f | ||
|
|
1cddb4eca0 | ||
|
|
2dc033944d | ||
|
|
63e6c78e71 | ||
|
|
e1f04eebe7 | ||
|
|
ce17e09f66 | ||
|
|
c98c1d3b90 | ||
|
|
ae3218e6d5 | ||
|
|
7497cc0f28 | ||
|
|
96c4cc7c98 | ||
|
|
1f7314f6b4 | ||
|
|
86d478a0d4 | ||
|
|
b45023630f | ||
|
|
2177049487 | ||
|
|
d3d1e7019f | ||
|
|
f2ad0b9594 | ||
|
|
9c79233b3c | ||
|
|
9ea5de1257 | ||
|
|
3ec97665ac | ||
|
|
bb4eeca2ab | ||
|
|
cc6a5be27f | ||
|
|
630df8a954 | ||
|
|
e28ab6368b | ||
|
|
751c78be4b | ||
|
|
5311bf90d5 | ||
|
|
c464689c6a | ||
|
|
442426be38 | ||
|
|
9a28139d43 | ||
|
|
25b768188f | ||
|
|
b794b62960 | ||
|
|
d7e00a5a49 | ||
|
|
19e6cfe7d2 | ||
|
|
63baa7b6f3 | ||
|
|
407fc90677 | ||
|
|
7da4c99d92 | ||
|
|
6b46e7e83f | ||
|
|
514ccd5f90 | ||
|
|
4565b80717 | ||
|
|
2bac6588c4 | ||
|
|
fc293cb01c | ||
|
|
73f9548242 | ||
|
|
7c91e51c08 | ||
|
|
a4867c406d | ||
|
|
bfea83ae75 | ||
|
|
7a2832c377 | ||
|
|
3f6c54a712 | ||
|
|
1bb590ce6d | ||
|
|
656326458f | ||
|
|
c6ab3ad2b3 | ||
|
|
d050cf72e9 | ||
|
|
084cc1893e | ||
|
|
cd01123b59 | ||
|
|
23ce84d41c | ||
|
|
4764cc2419 | ||
|
|
da66401576 | ||
|
|
0024c9d99c | ||
|
|
96d3b48f10 | ||
|
|
6a0e7a810f | ||
|
|
5b2513b7a1 | ||
|
|
7cec16eaf0 | ||
|
|
17dbb3ec77 | ||
|
|
00822c8404 | ||
|
|
55de30d6c7 | ||
|
|
8b7dbed27e | ||
|
|
71b8fa27d0 | ||
|
|
31174d719e | ||
|
|
5b5bb22ffd | ||
|
|
e98fe9ea2e | ||
|
|
32e9ded393 | ||
|
|
8293ca20be | ||
|
|
6c4ddfc349 | ||
|
|
cd0c478515 | ||
|
|
2cd25ac0e5 | ||
|
|
bb99ba3d1c | ||
|
|
64405dca5d | ||
|
|
69ea9ca8f8 | ||
|
|
41d0f2fcda | ||
|
|
93df1c0fbc | ||
|
|
86e952788d | ||
|
|
e890f2616f | ||
|
|
6c2ee584e5 | ||
|
|
5f07fc3010 | ||
|
|
20fa310ba9 | ||
|
|
0e3b08be9a | ||
|
|
b7d971d7c8 | ||
|
|
4373ae7f0b | ||
|
|
053325a691 | ||
|
|
c54267aa3a | ||
|
|
74dc430886 | ||
|
|
dc79ee4687 | ||
|
|
e154c946e6 | ||
|
|
08bfc0b388 | ||
|
|
5338270aef | ||
|
|
00550ba2c7 | ||
|
|
c58bec23bf | ||
|
|
a5b77be0ab | ||
|
|
f529681c35 | ||
|
|
e3042dd6d5 | ||
|
|
1ebab4fcb0 | ||
|
|
ccf38b6da7 | ||
|
|
9a0a687727 | ||
|
|
d00510978d | ||
|
|
9b478d98fd | ||
|
|
4845ca5bdb | ||
|
|
a844d2b091 | ||
|
|
69ca7f3b93 | ||
|
|
b9c6c33ceb | ||
|
|
5099d3c040 | ||
|
|
e34f8ac701 | ||
|
|
ab82a6f910 | ||
|
|
57f8bd3612 | ||
|
|
8ab96e2cea | ||
|
|
0a2e23c285 | ||
|
|
5c1d4077e2 | ||
|
|
2a46d9f98e | ||
|
|
ce5c213593 | ||
|
|
771a8d121b | ||
|
|
af88b0e283 | ||
|
|
8e5d7f2a5b | ||
|
|
1a22211a5d |
32
README.md
32
README.md
@@ -29,15 +29,15 @@
|
||||
|
||||
## 夜莺 Nightingale 是什么
|
||||
|
||||
夜莺监控是一款开源云原生观测分析工具,采用 All-in-One 的设计理念,集数据采集、可视化、监控告警、数据分析于一体,与云原生生态紧密集成,提供开箱即用的企业级监控分析和告警能力。夜莺于 2020 年 3 月 20 日,在 github 上发布 v1 版本,已累计迭代 100 多个版本。
|
||||
夜莺监控是一款开源云原生观测分析工具,采用 All-in-One 的设计理念,集数据采集、可视化、监控告警、数据分析于一体,与云原生生态紧密集成,提供开箱即用的企业级监控分析和告警能力。夜莺于 2020 年 3 月 20 日,在 GitHub 上发布 v1 版本,已累计迭代 100 多个版本。
|
||||
|
||||
夜莺最初由滴滴开发和开源,并于 2022 年 5 月 11 日,捐赠予中国计算机学会开源发展委员会(CCF ODC),为 CCF ODC 成立后接受捐赠的第一个开源项目。夜莺的核心研发团队,也是 Open-Falcon 项目原核心研发人员,从 2014 年(Open-Falcon 是 2014 年开源)算起来,也有 10 年了,只为把监控这个事情做好。
|
||||
|
||||
|
||||
## 快速开始
|
||||
- 👉[文档中心](https://flashcat.cloud/docs/) | [下载中心](https://flashcat.cloud/download/nightingale/)
|
||||
- ❤️[报告 Bug](https://github.com/ccfos/nightingale/issues/new?assignees=&labels=&projects=&template=question.yml)
|
||||
- ℹ️为了提供更快速的访问体验,上述文档和下载站点托管于 [FlashcatCloud](https://flashcat.cloud)
|
||||
- 👉 [文档中心](https://flashcat.cloud/docs/) | [下载中心](https://flashcat.cloud/download/nightingale/)
|
||||
- ❤️ [报告 Bug](https://github.com/ccfos/nightingale/issues/new?assignees=&labels=&projects=&template=question.yml)
|
||||
- ℹ️ 为了提供更快速的访问体验,上述文档和下载站点托管于 [FlashcatCloud](https://flashcat.cloud)
|
||||
|
||||
## 功能特点
|
||||
|
||||
@@ -50,6 +50,11 @@
|
||||
|
||||
## 截图演示
|
||||
|
||||
|
||||
你可以在页面的右上角,切换语言和主题,目前我们支持英语、简体中文、繁体中文。
|
||||
|
||||

|
||||
|
||||
即时查询,类似 Prometheus 内置的查询分析页面,做 ad-hoc 查询,夜莺做了一些 UI 优化,同时提供了一些内置 promql 指标,让不太了解 promql 的用户也可以快速查询。
|
||||
|
||||

|
||||
@@ -78,30 +83,21 @@
|
||||
|
||||

|
||||
|
||||
## 近期计划
|
||||
|
||||
- [ ] 仪表盘:支持内嵌 Grafana
|
||||
- [ ] 告警规则:通知时支持配置过滤标签,避免告警事件中一堆不重要的标签
|
||||
- [x] 告警规则:支持配置恢复时的 Promql,告警恢复通知也可以带上恢复时的值了
|
||||
- [ ] 机器管理:自定义标签拆分管理,agent 自动上报的标签和用户在页面自定义的标签分开管理,对于 agent 自动上报的标签,以 agent 为准,直接覆盖服务端 DB 中的数据
|
||||
- [ ] 机器管理:机器支持角色字段,即无头标签,用于描述混部场景
|
||||
- [ ] 机器管理:把业务组的 busigroup 标签迁移到机器的属性里,让机器支持挂到多个业务组
|
||||
- [ ] 告警规则:增加 Host Metrics 类别,支持按照业务组、角色、标签等筛选机器,规则 promql 支持变量,支持在机器颗粒度配置变量值
|
||||
- [ ] 告警通知:重构整个通知逻辑,引入事件处理的 pipeline,支持对告警事件做自定义处理和灵活分派
|
||||
|
||||
## 交流渠道
|
||||
- 报告Bug,优先推荐提交[夜莺GitHub Issue](https://github.com/ccfos/nightingale/issues/new?assignees=&labels=kind%2Fbug&projects=&template=bug_report.yml)
|
||||
- 推荐完整浏览[夜莺文档站点](https://flashcat.cloud/docs/content/flashcat-monitor/nightingale-v7/introduction/),了解更多信息
|
||||
- 推荐搜索关注夜莺公众号,第一时间获取社区动态:`夜莺监控Nightingale`
|
||||
- 日常答疑、技术分享、用户之间的交流,统一使用知识星球,大伙可以免费加入交流,[入口在这里](https://download.flashcat.cloud/ulric/20240319095409.png)
|
||||
- 日常问题交流:
|
||||
- QQ群:730841964
|
||||
- [加入微信群](https://download.flashcat.cloud/ulric/20241022141621.png),如果二维码过期了,可以联系我(我的微信:`picobyte`)拉群,备注: `夜莺互助群`
|
||||
|
||||
## 广受关注
|
||||
[](https://star-history.com/#ccfos/nightingale&Date)
|
||||
|
||||
|
||||
## 社区共建
|
||||
- ❇️请阅读浏览[夜莺开源项目和社区治理架构草案](./doc/community-governance.md),真诚欢迎每一位用户、开发者、公司以及组织,使用夜莺监控、积极反馈 Bug、提交功能需求、分享最佳实践,共建专业、活跃的夜莺开源社区。
|
||||
- 夜莺贡献者❤️
|
||||
- ❇️ 请阅读浏览[夜莺开源项目和社区治理架构草案](./doc/community-governance.md),真诚欢迎每一位用户、开发者、公司以及组织,使用夜莺监控、积极反馈 Bug、提交功能需求、分享最佳实践,共建专业、活跃的夜莺开源社区。
|
||||
- ❤️ 夜莺贡献者
|
||||
<a href="https://github.com/ccfos/nightingale/graphs/contributors">
|
||||
<img src="https://contrib.rocks/image?repo=ccfos/nightingale" />
|
||||
</a>
|
||||
|
||||
129
README_en.md
129
README_en.md
@@ -1,104 +1,113 @@
|
||||
<p align="center">
|
||||
<a href="https://github.com/ccfos/nightingale">
|
||||
<img src="doc/img/Nightingale_L_V.png" alt="nightingale - cloud native monitoring" width="240" /></a>
|
||||
<img src="doc/img/Nightingale_L_V.png" alt="nightingale - cloud native monitoring" width="100" /></a>
|
||||
</p>
|
||||
<p align="center">
|
||||
<b>Open-source Alert Management Expert, an Integrated Observability Platform</b>
|
||||
</p>
|
||||
|
||||
<p align="center">
|
||||
<img alt="GitHub latest release" src="https://img.shields.io/github/v/release/ccfos/nightingale"/>
|
||||
<a href="https://n9e.github.io">
|
||||
<a href="https://flashcat.cloud/docs/">
|
||||
<img alt="Docs" src="https://img.shields.io/badge/docs-get%20started-brightgreen"/></a>
|
||||
<a href="https://hub.docker.com/u/flashcatcloud">
|
||||
<img alt="Docker pulls" src="https://img.shields.io/docker/pulls/flashcatcloud/nightingale"/></a>
|
||||
<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/ccfos/nightingale">
|
||||
<img alt="GitHub Repo issues" src="https://img.shields.io/github/issues/ccfos/nightingale">
|
||||
<img alt="GitHub Repo issues closed" src="https://img.shields.io/github/issues-closed/ccfos/nightingale">
|
||||
<img alt="GitHub forks" src="https://img.shields.io/github/forks/ccfos/nightingale">
|
||||
<a href="https://github.com/ccfos/nightingale/graphs/contributors">
|
||||
<img alt="GitHub contributors" src="https://img.shields.io/github/contributors-anon/ccfos/nightingale"/></a>
|
||||
<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/ccfos/nightingale">
|
||||
<img alt="GitHub forks" src="https://img.shields.io/github/forks/ccfos/nightingale">
|
||||
<br/><img alt="GitHub Repo issues" src="https://img.shields.io/github/issues/ccfos/nightingale">
|
||||
<img alt="GitHub Repo issues closed" src="https://img.shields.io/github/issues-closed/ccfos/nightingale">
|
||||
<img alt="GitHub latest release" src="https://img.shields.io/github/v/release/ccfos/nightingale"/>
|
||||
<img alt="License" src="https://img.shields.io/badge/license-Apache--2.0-blue"/>
|
||||
<a href="https://n9e-talk.slack.com/">
|
||||
<img alt="GitHub contributors" src="https://img.shields.io/badge/join%20slack-%23n9e-brightgreen.svg"/></a>
|
||||
<img alt="License" src="https://img.shields.io/badge/license-Apache--2.0-blue"/>
|
||||
</p>
|
||||
<p align="center">
|
||||
An open-source cloud-native monitoring system that is <b>all-in-one</b> <br/>
|
||||
<b>Out-of-the-box</b>, it integrates data collection, visualization, and monitoring alert <br/>
|
||||
We recommend upgrading your <b>Prometheus + AlertManager + Grafana</b> combination to Nightingale!
|
||||
</p>
|
||||
|
||||
|
||||
|
||||
[English](./README_en.md) | [中文](./README.md)
|
||||
|
||||
## What is Nightingale
|
||||
|
||||
## Highlighted Features
|
||||
Nightingale aims to combine the advantages of Prometheus and Grafana. It manages alert rules and visualizes metrics, logs, traces in a beautiful WebUI.
|
||||
|
||||
- **Out-of-the-box**
|
||||
- Supports multiple deployment methods such as **Docker, Helm Chart, and cloud services**, integrates data collection, monitoring, and alerting into one system, and comes with various monitoring dashboards, quick views, and alert rule templates. **It greatly reduces the construction cost, learning cost, and usage cost of cloud-native monitoring systems**.
|
||||
- **Professional Alerting**
|
||||
- Provides visual alert configuration and management, supports various alert rules, offers the ability to configure silence and subscription rules, supports multiple alert delivery channels, and has features such as alert self-healing and event management.
|
||||
- **Cloud-Native**
|
||||
- Quickly builds an enterprise-level cloud-native monitoring system through a turnkey approach, supports multiple collectors such as [Categraf](https://github.com/flashcatcloud/categraf), Telegraf, and Grafana-agent, supports multiple data sources such as Prometheus, VictoriaMetrics, M3DB, ElasticSearch, and Jaeger, and is compatible with importing Grafana dashboards. **It seamlessly integrates with the cloud-native ecosystem**.
|
||||
- **High Performance and High Availability**
|
||||
- Due to the multi-data-source management engine of Nightingale and its excellent architecture design, and utilizing a high-performance time-series database, it can handle data collection, storage, and alert analysis scenarios with billions of time-series data, saving a lot of costs.
|
||||
- Nightingale components can be horizontally scaled with no single point of failure. It has been deployed in thousands of enterprises and tested in harsh production practices. Many leading Internet companies have used Nightingale for cluster machines with hundreds of nodes, processing billions of time-series data.
|
||||
- **Flexible Extension and Centralized Management**
|
||||
- Nightingale can be deployed on a 1-core 1G cloud host, deployed in a cluster of hundreds of machines, or run in Kubernetes. Time-series databases, alert engines, and other components can also be decentralized to various data centers and regions, balancing edge deployment with centralized management. **It solves the problem of data fragmentation and lack of unified views**.
|
||||
Originally developed and open-sourced by Didi, Nightingale was donated to the China Computer Federation Open Source Development Committee (CCF ODC) on May 11, 2022, becoming the first open-source project accepted by the CCF ODC after its establishment.
|
||||
|
||||
|
||||
#### If you are using Prometheus and have one or more of the following requirement scenarios, it is recommended that you upgrade to Nightingale:
|
||||
## Quick Start
|
||||
|
||||
- Multiple systems such as Prometheus, Alertmanager, Grafana, etc. are fragmented and lack a unified view and cannot be used out of the box;
|
||||
- The way to manage Prometheus and Alertmanager by modifying configuration files has a big learning curve and is difficult to collaborate;
|
||||
- Too much data to scale-up your Prometheus cluster;
|
||||
- Multiple Prometheus clusters running in production environments, which faced high management and usage costs;
|
||||
- 👉 [Documentation](https://flashcat.cloud/docs/) | [Download](https://flashcat.cloud/download/nightingale/)
|
||||
- ❤️ [Report a Bug](https://github.com/ccfos/nightingale/issues/new?assignees=&labels=&projects=&template=question.yml)
|
||||
- ℹ️ For faster access, the above documentation and download sites are hosted on [FlashcatCloud](https://flashcat.cloud).
|
||||
|
||||
#### If you are using Zabbix and have the following scenarios, it is recommended that you upgrade to Nightingale:
|
||||
## Features
|
||||
|
||||
- Monitoring too much data and wanting a better scalable solution;
|
||||
- A high learning curve and a desire for better efficiency of collaborative use in a multi-person, multi-team model;
|
||||
- Microservice and cloud-native architectures with variable monitoring data lifecycles and high monitoring data dimension bases, which are not easily adaptable to the Zabbix data model;
|
||||
- **Integration with Multiple Time-Series Databases:** Supports integration with various time-series databases such as Prometheus, VictoriaMetrics, Thanos, Mimir, M3DB, and TDengine, enabling unified alert management.
|
||||
- **Advanced Alerting Capabilities:** Comes with built-in support for multiple alerting rules, extensible to common notification channels. It also supports alert suppression, silencing, subscription, self-healing, and alert event management.
|
||||
- **High-Performance Visualization Engine:** Offers various chart styles with numerous built-in dashboard templates and the ability to import Grafana templates. Ready to use with a business-friendly open-source license.
|
||||
- **Support for Common Collectors:** Compatible with [Categraf](https://flashcat.cloud/product/categraf), Telegraf, Grafana-agent, Datadog-agent, and various exporters as collectors—there's no data that can't be monitored.
|
||||
- **Seamless Integration with [Flashduty](https://flashcat.cloud/product/flashcat-duty/):** Enables alert aggregation, acknowledgment, escalation, scheduling, and IM integration, ensuring no alerts are missed, reducing unnecessary interruptions, and enhancing efficient collaboration.
|
||||
|
||||
|
||||
#### If you are using [open-falcon](https://github.com/open-falcon/falcon-plus), we recommend you to upgrade to Nightingale:
|
||||
- For more information about open-falcon and Nightingale, please refer to read [Ten features and trends of cloud-native monitoring](https://mp.weixin.qq.com/s?__biz=MzkzNjI5OTM5Nw==&mid=2247483738&idx=1&sn=e8bdbb974a2cd003c1abcc2b5405dd18&chksm=c2a19fb0f5d616a63185cd79277a79a6b80118ef2185890d0683d2bb20451bd9303c78d083c5#rd)。
|
||||
|
||||
## Getting Started
|
||||
|
||||
[https://n9e.github.io/](https://n9e.github.io/)
|
||||
|
||||
## Screenshots
|
||||
|
||||
https://user-images.githubusercontent.com/792850/216888712-2565fcea-9df5-47bd-a49e-d60af9bd76e8.mp4
|
||||
You can switch languages and themes in the top right corner. We now support English, Simplified Chinese, and Traditional Chinese.
|
||||
|
||||

|
||||
|
||||
### Instant Query
|
||||
|
||||
Similar to the built-in query analysis page in Prometheus, Nightingale offers an ad-hoc query feature with UI enhancements. It also provides built-in PromQL metrics, allowing users unfamiliar with PromQL to quickly perform queries.
|
||||
|
||||

|
||||
|
||||
### Metric View
|
||||
|
||||
Alternatively, you can use the Metric View to access data. With this feature, Instant Query becomes less necessary, as it caters more to advanced users. Regular users can easily perform queries using the Metric View.
|
||||
|
||||

|
||||
|
||||
### Built-in Dashboards
|
||||
|
||||
Nightingale includes commonly used dashboards that can be imported and used directly. You can also import Grafana dashboards, although compatibility is limited to basic Grafana charts. If you’re accustomed to Grafana, it’s recommended to continue using it for visualization, with Nightingale serving as an alerting engine.
|
||||
|
||||

|
||||
|
||||
### Built-in Alert Rules
|
||||
|
||||
In addition to the built-in dashboards, Nightingale also comes with numerous alert rules that are ready to use out of the box.
|
||||
|
||||

|
||||
|
||||
|
||||
|
||||
## Architecture
|
||||
|
||||
<img src="doc/img/arch-product.png" width="600">
|
||||
In most community scenarios, Nightingale is primarily used as an alert engine, integrating with multiple time-series databases to unify alert rule management. Grafana remains the preferred tool for visualization. As an alert engine, the product architecture of Nightingale is as follows:
|
||||
|
||||
Nightingale monitoring can receive monitoring data reported by various collectors (such as [Categraf](https://github.com/flashcatcloud/categraf) , telegraf, grafana-agent, Prometheus, etc.) and write them to various popular time-series databases (such as Prometheus, M3DB, VictoriaMetrics, Thanos, TDEngine, etc.). It provides configuration capabilities for alert rules, silence rules, and subscription rules, as well as the ability to view monitoring data. It also provides automatic alarm self-healing mechanisms (such as automatically calling back to a webhook address or executing a script after an alarm is triggered), and the ability to store and manage historical alarm events and view them in groups.
|
||||

|
||||
|
||||
If the performance of a standalone time-series database (such as Prometheus) has bottlenecks or poor disaster recovery, we recommend using [VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics). The VictoriaMetrics architecture is relatively simple, has excellent performance, and is easy to deploy and maintain. The architecture diagram is as shown above. For more detailed documentation on VictoriaMetrics, please refer to its [official website](https://victoriametrics.com/).
|
||||
For certain edge data centers with poor network connectivity to the central Nightingale server, we offer a distributed deployment mode for the alert engine. In this mode, even if the network is disconnected, the alerting functionality remains unaffected.
|
||||
|
||||
**We welcome you to participate in the Nightingale open-source project and community in various ways, including but not limited to**:
|
||||
- Adding and improving documentation => [n9e.github.io](https://n9e.github.io/)
|
||||
- Sharing your best practices and experience in using Nightingale monitoring => [Article sharing]((https://n9e.github.io/docs/prologue/share/))
|
||||
- Submitting product suggestions => [github issue](https://github.com/ccfos/nightingale/issues/new?assignees=&labels=kind%2Ffeature&template=enhancement.md)
|
||||
- Submitting code to make Nightingale monitoring faster, more stable, and easier to use => [github pull request](https://github.com/didi/nightingale/pulls)
|
||||

|
||||
|
||||
|
||||
**Respecting, recognizing, and recording the work of every contributor** is the first guiding principle of the Nightingale open-source community. We advocate effective questioning, which not only respects the developer's time but also contributes to the accumulation of knowledge in the entire community
|
||||
- Before asking a question, please first refer to the [FAQ](https://www.gitlink.org.cn/ccfos/nightingale/wiki/faq)
|
||||
- We use [GitHub Discussions](https://github.com/ccfos/nightingale/discussions) as the communication forum. You can search and ask questions here.
|
||||
- We also recommend that you join ours [Slack channel](https://n9e-talk.slack.com/) to exchange experiences with other Nightingale users.
|
||||
## Communication Channels
|
||||
|
||||
|
||||
## Who is using Nightingale
|
||||
You can register your usage and share your experience by posting on **[Who is Using Nightingale](https://github.com/ccfos/nightingale/issues/897)**.
|
||||
- **Report Bugs:** It is highly recommended to submit issues via the [Nightingale GitHub Issue tracker](https://github.com/ccfos/nightingale/issues/new?assignees=&labels=kind%2Fbug&projects=&template=bug_report.yml).
|
||||
- **Documentation:** For more information, we recommend thoroughly browsing the [Nightingale Documentation Site](https://flashcat.cloud/docs/content/flashcat-monitor/nightingale-v7/introduction/).
|
||||
|
||||
## Stargazers over time
|
||||
[](https://starchart.cc/ccfos/nightingale)
|
||||
|
||||
## Contributors
|
||||
[](https://star-history.com/#ccfos/nightingale&Date)
|
||||
|
||||
## Community Co-Building
|
||||
|
||||
- ❇️ Please read the [Nightingale Open Source Project and Community Governance Draft](./doc/community-governance.md). We sincerely welcome every user, developer, company, and organization to use Nightingale, actively report bugs, submit feature requests, share best practices, and help build a professional and active open-source community.
|
||||
- ❤️ Nightingale Contributors
|
||||
<a href="https://github.com/ccfos/nightingale/graphs/contributors">
|
||||
<img src="https://contrib.rocks/image?repo=ccfos/nightingale" />
|
||||
</a>
|
||||
|
||||
## License
|
||||
[Apache License V2.0](https://github.com/didi/nightingale/blob/main/LICENSE)
|
||||
- [Apache License V2.0](https://github.com/didi/nightingale/blob/main/LICENSE)
|
||||
|
||||
@@ -32,6 +32,7 @@ type Alerting struct {
|
||||
Timeout int64
|
||||
TemplatesDir string
|
||||
NotifyConcurrency int
|
||||
WebhookBatchSend bool
|
||||
}
|
||||
|
||||
type CallPlugin struct {
|
||||
@@ -59,10 +60,6 @@ func (a *Alert) PreCheck(configDir string) {
|
||||
a.Heartbeat.Interval = 1000
|
||||
}
|
||||
|
||||
if a.Heartbeat.EngineName == "" {
|
||||
a.Heartbeat.EngineName = "default"
|
||||
}
|
||||
|
||||
if a.EngineDelay == 0 {
|
||||
a.EngineDelay = 30
|
||||
}
|
||||
|
||||
@@ -62,6 +62,7 @@ func Initialize(configDir string, cryptoKey string) (func(), error) {
|
||||
userCache := memsto.NewUserCache(ctx, syncStats)
|
||||
userGroupCache := memsto.NewUserGroupCache(ctx, syncStats)
|
||||
taskTplsCache := memsto.NewTaskTplCache(ctx)
|
||||
configCvalCache := memsto.NewCvalCache(ctx, syncStats)
|
||||
|
||||
promClients := prom.NewPromClient(ctx)
|
||||
tdengineClients := tdengine.NewTdengineClient(ctx, config.Alert.Heartbeat)
|
||||
@@ -70,7 +71,8 @@ func Initialize(configDir string, cryptoKey string) (func(), error) {
|
||||
|
||||
Start(config.Alert, config.Pushgw, syncStats, alertStats, externalProcessors, targetCache, busiGroupCache, alertMuteCache, alertRuleCache, notifyConfigCache, taskTplsCache, dsCache, ctx, promClients, tdengineClients, userCache, userGroupCache)
|
||||
|
||||
r := httpx.GinEngine(config.Global.RunMode, config.HTTP)
|
||||
r := httpx.GinEngine(config.Global.RunMode, config.HTTP,
|
||||
configCvalCache.PrintBodyPaths, configCvalCache.PrintAccessLog)
|
||||
rt := router.New(config.HTTP, config.Alert, alertMuteCache, targetCache, busiGroupCache, alertStats, ctx, externalProcessors)
|
||||
|
||||
if config.Ibex.Enable {
|
||||
@@ -112,5 +114,5 @@ func Start(alertc aconf.Alert, pushgwc pconf.Pushgw, syncStats *memsto.Stats, al
|
||||
go consumer.LoopConsume()
|
||||
|
||||
go queue.ReportQueueSize(alertStats)
|
||||
go sender.InitEmailSender(notifyConfigCache)
|
||||
go sender.InitEmailSender(ctx, notifyConfigCache)
|
||||
}
|
||||
|
||||
@@ -5,18 +5,20 @@ import (
|
||||
"math"
|
||||
"strings"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/prometheus/common/model"
|
||||
)
|
||||
|
||||
type AnomalyPoint struct {
|
||||
Key string `json:"key"`
|
||||
Labels model.Metric `json:"labels"`
|
||||
Timestamp int64 `json:"timestamp"`
|
||||
Value float64 `json:"value"`
|
||||
Severity int `json:"severity"`
|
||||
Triggered bool `json:"triggered"`
|
||||
Query string `json:"query"`
|
||||
Values string `json:"values"`
|
||||
Key string `json:"key"`
|
||||
Labels model.Metric `json:"labels"`
|
||||
Timestamp int64 `json:"timestamp"`
|
||||
Value float64 `json:"value"`
|
||||
Severity int `json:"severity"`
|
||||
Triggered bool `json:"triggered"`
|
||||
Query string `json:"query"`
|
||||
Values string `json:"values"`
|
||||
RecoverConfig models.RecoverConfig `json:"recover_config"`
|
||||
}
|
||||
|
||||
func NewAnomalyPoint(key string, labels map[string]string, ts int64, value float64, severity int) AnomalyPoint {
|
||||
|
||||
@@ -2,6 +2,7 @@ package common
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"strings"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
)
|
||||
@@ -34,9 +35,9 @@ func MatchGroupsName(groupName string, groupFilter []models.TagFilter) bool {
|
||||
func matchTag(value string, filter models.TagFilter) bool {
|
||||
switch filter.Func {
|
||||
case "==":
|
||||
return filter.Value == value
|
||||
return strings.TrimSpace(filter.Value) == strings.TrimSpace(value)
|
||||
case "!=":
|
||||
return filter.Value != value
|
||||
return strings.TrimSpace(filter.Value) != strings.TrimSpace(value)
|
||||
case "in":
|
||||
_, has := filter.Vset[value]
|
||||
return has
|
||||
|
||||
@@ -100,6 +100,8 @@ func (e *Dispatch) relaodTpls() error {
|
||||
models.Mm: sender.NewSender(models.Mm, tmpTpls),
|
||||
models.Telegram: sender.NewSender(models.Telegram, tmpTpls),
|
||||
models.FeishuCard: sender.NewSender(models.FeishuCard, tmpTpls),
|
||||
models.Lark: sender.NewSender(models.Lark, tmpTpls),
|
||||
models.LarkCard: sender.NewSender(models.LarkCard, tmpTpls),
|
||||
}
|
||||
|
||||
// domain -> Callback()
|
||||
@@ -110,7 +112,9 @@ func (e *Dispatch) relaodTpls() error {
|
||||
models.TelegramDomain: sender.NewCallBacker(models.TelegramDomain, e.targetCache, e.userCache, e.taskTplsCache, tmpTpls),
|
||||
models.FeishuCardDomain: sender.NewCallBacker(models.FeishuCardDomain, e.targetCache, e.userCache, e.taskTplsCache, tmpTpls),
|
||||
models.IbexDomain: sender.NewCallBacker(models.IbexDomain, e.targetCache, e.userCache, e.taskTplsCache, tmpTpls),
|
||||
models.LarkDomain: sender.NewCallBacker(models.LarkDomain, e.targetCache, e.userCache, e.taskTplsCache, tmpTpls),
|
||||
models.DefaultDomain: sender.NewCallBacker(models.DefaultDomain, e.targetCache, e.userCache, e.taskTplsCache, tmpTpls),
|
||||
models.LarkCardDomain: sender.NewCallBacker(models.LarkCardDomain, e.targetCache, e.userCache, e.taskTplsCache, tmpTpls),
|
||||
}
|
||||
|
||||
e.RwLock.RLock()
|
||||
@@ -163,7 +167,7 @@ func (e *Dispatch) HandleEventNotify(event *models.AlertCurEvent, isSubscribe bo
|
||||
}
|
||||
|
||||
// 处理事件发送,这里用一个goroutine处理一个event的所有发送事件
|
||||
go e.Send(rule, event, notifyTarget)
|
||||
go e.Send(rule, event, notifyTarget, isSubscribe)
|
||||
|
||||
// 如果是不是订阅规则出现的event, 则需要处理订阅规则的event
|
||||
if !isSubscribe {
|
||||
@@ -234,11 +238,12 @@ func (e *Dispatch) handleSub(sub *models.AlertSubscribe, event models.AlertCurEv
|
||||
e.HandleEventNotify(&event, true)
|
||||
}
|
||||
|
||||
func (e *Dispatch) Send(rule *models.AlertRule, event *models.AlertCurEvent, notifyTarget *NotifyTarget) {
|
||||
func (e *Dispatch) Send(rule *models.AlertRule, event *models.AlertCurEvent, notifyTarget *NotifyTarget, isSubscribe bool) {
|
||||
needSend := e.BeforeSenderHook(event)
|
||||
if needSend {
|
||||
for channel, uids := range notifyTarget.ToChannelUserMap() {
|
||||
msgCtx := sender.BuildMessageContext(rule, []*models.AlertCurEvent{event}, uids, e.userCache, e.Astats)
|
||||
msgCtx := sender.BuildMessageContext(e.ctx, rule, []*models.AlertCurEvent{event},
|
||||
uids, e.userCache, e.Astats)
|
||||
e.RwLock.RLock()
|
||||
s := e.Senders[channel]
|
||||
e.RwLock.RUnlock()
|
||||
@@ -261,22 +266,38 @@ func (e *Dispatch) Send(rule *models.AlertRule, event *models.AlertCurEvent, not
|
||||
e.SendCallbacks(rule, notifyTarget, event)
|
||||
|
||||
// handle global webhooks
|
||||
sender.SendWebhooks(notifyTarget.ToWebhookList(), event, e.Astats)
|
||||
if e.alerting.WebhookBatchSend {
|
||||
sender.BatchSendWebhooks(e.ctx, notifyTarget.ToWebhookList(), event, e.Astats)
|
||||
} else {
|
||||
sender.SingleSendWebhooks(e.ctx, notifyTarget.ToWebhookList(), event, e.Astats)
|
||||
}
|
||||
|
||||
// handle plugin call
|
||||
go sender.MayPluginNotify(e.genNoticeBytes(event), e.notifyConfigCache.GetNotifyScript(), e.Astats)
|
||||
go sender.MayPluginNotify(e.ctx, e.genNoticeBytes(event), e.notifyConfigCache.
|
||||
GetNotifyScript(), e.Astats, event)
|
||||
|
||||
if !isSubscribe {
|
||||
// handle ibex callbacks
|
||||
e.HandleIbex(rule, event)
|
||||
}
|
||||
}
|
||||
|
||||
func (e *Dispatch) SendCallbacks(rule *models.AlertRule, notifyTarget *NotifyTarget, event *models.AlertCurEvent) {
|
||||
|
||||
uids := notifyTarget.ToUidList()
|
||||
urls := notifyTarget.ToCallbackList()
|
||||
whMap := notifyTarget.ToWebhookMap()
|
||||
for _, urlStr := range urls {
|
||||
if len(urlStr) == 0 {
|
||||
continue
|
||||
}
|
||||
|
||||
cbCtx := sender.BuildCallBackContext(e.ctx, urlStr, rule, []*models.AlertCurEvent{event}, uids, e.userCache, e.Astats)
|
||||
cbCtx := sender.BuildCallBackContext(e.ctx, urlStr, rule, []*models.AlertCurEvent{event}, uids, e.userCache, e.alerting.WebhookBatchSend, e.Astats)
|
||||
|
||||
if wh, ok := whMap[cbCtx.CallBackURL]; ok && wh.Enable {
|
||||
logger.Debugf("SendCallbacks: webhook[%s] is in global conf.", cbCtx.CallBackURL)
|
||||
continue
|
||||
}
|
||||
|
||||
if strings.HasPrefix(urlStr, "${ibex}") {
|
||||
e.CallBacks[models.IbexDomain].CallBack(cbCtx)
|
||||
@@ -299,6 +320,12 @@ func (e *Dispatch) SendCallbacks(rule *models.AlertRule, notifyTarget *NotifyTar
|
||||
continue
|
||||
}
|
||||
|
||||
// process lark card
|
||||
if parsedURL.Host == models.LarkDomain && parsedURL.Query().Get("card") == "1" {
|
||||
e.CallBacks[models.LarkCardDomain].CallBack(cbCtx)
|
||||
continue
|
||||
}
|
||||
|
||||
callBacker, ok := e.CallBacks[parsedURL.Host]
|
||||
if ok {
|
||||
callBacker.CallBack(cbCtx)
|
||||
@@ -308,6 +335,30 @@ func (e *Dispatch) SendCallbacks(rule *models.AlertRule, notifyTarget *NotifyTar
|
||||
}
|
||||
}
|
||||
|
||||
func (e *Dispatch) HandleIbex(rule *models.AlertRule, event *models.AlertCurEvent) {
|
||||
// 解析 RuleConfig 字段
|
||||
var ruleConfig struct {
|
||||
TaskTpls []*models.Tpl `json:"task_tpls"`
|
||||
}
|
||||
json.Unmarshal([]byte(rule.RuleConfig), &ruleConfig)
|
||||
|
||||
for _, t := range ruleConfig.TaskTpls {
|
||||
if t.TplId == 0 {
|
||||
continue
|
||||
}
|
||||
|
||||
if len(t.Host) == 0 {
|
||||
sender.CallIbex(e.ctx, t.TplId, event.TargetIdent,
|
||||
e.taskTplsCache, e.targetCache, e.userCache, event)
|
||||
continue
|
||||
}
|
||||
for _, host := range t.Host {
|
||||
sender.CallIbex(e.ctx, t.TplId, host,
|
||||
e.taskTplsCache, e.targetCache, e.userCache, event)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
type Notice struct {
|
||||
Event *models.AlertCurEvent `json:"event"`
|
||||
Tpls map[string]string `json:"tpls"`
|
||||
|
||||
@@ -79,13 +79,53 @@ func (s *NotifyTarget) ToCallbackList() []string {
|
||||
func (s *NotifyTarget) ToWebhookList() []*models.Webhook {
|
||||
webhooks := make([]*models.Webhook, 0, len(s.webhooks))
|
||||
for _, wh := range s.webhooks {
|
||||
if wh.Batch == 0 {
|
||||
wh.Batch = 1000
|
||||
}
|
||||
|
||||
if wh.Timeout == 0 {
|
||||
wh.Timeout = 10
|
||||
}
|
||||
|
||||
if wh.RetryCount == 0 {
|
||||
wh.RetryCount = 10
|
||||
}
|
||||
|
||||
if wh.RetryInterval == 0 {
|
||||
wh.RetryInterval = 10
|
||||
}
|
||||
|
||||
webhooks = append(webhooks, wh)
|
||||
}
|
||||
return webhooks
|
||||
}
|
||||
|
||||
func (s *NotifyTarget) ToWebhookMap() map[string]*models.Webhook {
|
||||
webhookMap := make(map[string]*models.Webhook, len(s.webhooks))
|
||||
for _, wh := range s.webhooks {
|
||||
if wh.Batch == 0 {
|
||||
wh.Batch = 1000
|
||||
}
|
||||
|
||||
if wh.Timeout == 0 {
|
||||
wh.Timeout = 10
|
||||
}
|
||||
|
||||
if wh.RetryCount == 0 {
|
||||
wh.RetryCount = 10
|
||||
}
|
||||
|
||||
if wh.RetryInterval == 0 {
|
||||
wh.RetryInterval = 10
|
||||
}
|
||||
|
||||
webhookMap[wh.Url] = wh
|
||||
}
|
||||
return webhookMap
|
||||
}
|
||||
|
||||
func (s *NotifyTarget) ToUidList() []int64 {
|
||||
uids := make([]int64, len(s.userMap))
|
||||
uids := make([]int64, 0, len(s.userMap))
|
||||
for uid, _ := range s.userMap {
|
||||
uids = append(uids, uid)
|
||||
}
|
||||
|
||||
@@ -5,6 +5,7 @@ import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"math"
|
||||
"reflect"
|
||||
"sort"
|
||||
"strings"
|
||||
"time"
|
||||
@@ -46,6 +47,14 @@ const (
|
||||
QUERY_DATA = "query_data"
|
||||
)
|
||||
|
||||
type JoinType string
|
||||
|
||||
const (
|
||||
Left JoinType = "left"
|
||||
Right JoinType = "right"
|
||||
Inner JoinType = "inner"
|
||||
)
|
||||
|
||||
func NewAlertRuleWorker(rule *models.AlertRule, datasourceId int64, processor *process.Processor, promClients *prom.PromClientMap, tdengineClients *tdengine.TdengineClientMap, ctx *ctx.Context) *AlertRuleWorker {
|
||||
arw := &AlertRuleWorker{
|
||||
datasourceId: datasourceId,
|
||||
@@ -143,19 +152,22 @@ func (arw *AlertRuleWorker) Eval() {
|
||||
if p.Severity > point.Severity {
|
||||
hash := process.Hash(cachedRule.Id, arw.processor.DatasourceId(), p)
|
||||
arw.processor.DeleteProcessEvent(hash)
|
||||
models.AlertCurEventDelByHash(arw.ctx, hash)
|
||||
|
||||
pointsMap[tagHash] = point
|
||||
}
|
||||
}
|
||||
|
||||
now := time.Now().Unix()
|
||||
for _, point := range pointsMap {
|
||||
str := fmt.Sprintf("%v", point.Value)
|
||||
arw.processor.RecoverSingle(process.Hash(cachedRule.Id, arw.processor.DatasourceId(), point), point.Timestamp, &str)
|
||||
arw.processor.RecoverSingle(true, process.Hash(cachedRule.Id, arw.processor.DatasourceId(), point), now, &str)
|
||||
}
|
||||
} else {
|
||||
now := time.Now().Unix()
|
||||
for _, point := range recoverPoints {
|
||||
str := fmt.Sprintf("%v", point.Value)
|
||||
arw.processor.RecoverSingle(process.Hash(cachedRule.Id, arw.processor.DatasourceId(), point), point.Timestamp, &str)
|
||||
arw.processor.RecoverSingle(true, process.Hash(cachedRule.Id, arw.processor.DatasourceId(), point), now, &str)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -255,9 +267,12 @@ func (arw *AlertRuleWorker) GetTdengineAnomalyPoint(rule *models.AlertRule, dsId
|
||||
arw.inhibit = ruleQuery.Inhibit
|
||||
if len(ruleQuery.Queries) > 0 {
|
||||
seriesStore := make(map[uint64]models.DataResp)
|
||||
seriesTagIndex := make(map[uint64][]uint64)
|
||||
// 将不同查询的 hash 索引分组存放
|
||||
seriesTagIndexes := make(map[string]map[uint64][]uint64)
|
||||
|
||||
for _, query := range ruleQuery.Queries {
|
||||
seriesTagIndex := make(map[uint64][]uint64)
|
||||
|
||||
arw.processor.Stats.CounterQueryDataTotal.WithLabelValues(fmt.Sprintf("%d", arw.datasourceId)).Inc()
|
||||
cli := arw.tdengineClients.GetCli(dsId)
|
||||
if cli == nil {
|
||||
@@ -267,7 +282,7 @@ func (arw *AlertRuleWorker) GetTdengineAnomalyPoint(rule *models.AlertRule, dsId
|
||||
continue
|
||||
}
|
||||
|
||||
series, err := cli.Query(query)
|
||||
series, err := cli.Query(query,0)
|
||||
arw.processor.Stats.CounterQueryDataTotal.WithLabelValues(fmt.Sprintf("%d", arw.datasourceId)).Inc()
|
||||
if err != nil {
|
||||
logger.Warningf("rule_eval rid:%d query data error: %v", rule.Id, err)
|
||||
@@ -275,13 +290,19 @@ func (arw *AlertRuleWorker) GetTdengineAnomalyPoint(rule *models.AlertRule, dsId
|
||||
arw.processor.Stats.CounterRuleEvalErrorTotal.WithLabelValues(fmt.Sprintf("%v", arw.processor.DatasourceId()), QUERY_DATA).Inc()
|
||||
continue
|
||||
}
|
||||
|
||||
// 此条日志很重要,是告警判断的现场值
|
||||
logger.Debugf("rule_eval rid:%d req:%+v resp:%+v", rule.Id, query, series)
|
||||
MakeSeriesMap(series, seriesTagIndex, seriesStore)
|
||||
ref, err := GetQueryRef(query)
|
||||
if err != nil {
|
||||
logger.Warningf("rule_eval rid:%d query ref error: %v query:%+v", rule.Id, err, query)
|
||||
arw.processor.Stats.CounterRuleEvalErrorTotal.WithLabelValues(fmt.Sprintf("%v", arw.processor.DatasourceId()), GET_RULE_CONFIG).Inc()
|
||||
continue
|
||||
}
|
||||
seriesTagIndexes[ref] = seriesTagIndex
|
||||
}
|
||||
|
||||
points, recoverPoints = GetAnomalyPoint(rule.Id, ruleQuery, seriesTagIndex, seriesStore)
|
||||
points, recoverPoints = GetAnomalyPoint(rule.Id, ruleQuery, seriesTagIndexes, seriesStore)
|
||||
}
|
||||
|
||||
return points, recoverPoints
|
||||
@@ -355,11 +376,6 @@ func (arw *AlertRuleWorker) GetHostAnomalyPoint(ruleConfig string) []common.Anom
|
||||
}
|
||||
m["ident"] = target.Ident
|
||||
|
||||
bg := arw.processor.BusiGroupCache.GetByBusiGroupId(target.GroupId)
|
||||
if bg != nil && bg.LabelEnable == 1 {
|
||||
m["busigroup"] = bg.LabelValue
|
||||
}
|
||||
|
||||
lst = append(lst, common.NewAnomalyPoint(trigger.Type, m, now, float64(now-target.UpdateAt), trigger.Severity))
|
||||
}
|
||||
case "offset":
|
||||
@@ -408,11 +424,6 @@ func (arw *AlertRuleWorker) GetHostAnomalyPoint(ruleConfig string) []common.Anom
|
||||
}
|
||||
m["ident"] = host
|
||||
|
||||
bg := arw.processor.BusiGroupCache.GetByBusiGroupId(target.GroupId)
|
||||
if bg != nil && bg.LabelEnable == 1 {
|
||||
m["busigroup"] = bg.LabelValue
|
||||
}
|
||||
|
||||
lst = append(lst, common.NewAnomalyPoint(trigger.Type, m, now, float64(offset), trigger.Severity))
|
||||
}
|
||||
case "pct_target_miss":
|
||||
@@ -441,17 +452,28 @@ func (arw *AlertRuleWorker) GetHostAnomalyPoint(ruleConfig string) []common.Anom
|
||||
return lst
|
||||
}
|
||||
|
||||
func GetAnomalyPoint(ruleId int64, ruleQuery models.RuleQuery, seriesTagIndex map[uint64][]uint64, seriesStore map[uint64]models.DataResp) ([]common.AnomalyPoint, []common.AnomalyPoint) {
|
||||
func GetAnomalyPoint(ruleId int64, ruleQuery models.RuleQuery, seriesTagIndexes map[string]map[uint64][]uint64, seriesStore map[uint64]models.DataResp) ([]common.AnomalyPoint, []common.AnomalyPoint) {
|
||||
points := []common.AnomalyPoint{}
|
||||
recoverPoints := []common.AnomalyPoint{}
|
||||
|
||||
if len(ruleQuery.Triggers) == 0 {
|
||||
return points, recoverPoints
|
||||
}
|
||||
|
||||
if len(seriesTagIndexes) == 0 {
|
||||
return points, recoverPoints
|
||||
}
|
||||
|
||||
for _, trigger := range ruleQuery.Triggers {
|
||||
// seriesTagIndex 的 key 仅做分组使用,value 为每组 series 的 hash
|
||||
seriesTagIndex := ProcessJoins(ruleId, trigger, seriesTagIndexes, seriesStore)
|
||||
|
||||
for _, seriesHash := range seriesTagIndex {
|
||||
sort.Slice(seriesHash, func(i, j int) bool {
|
||||
return seriesHash[i] < seriesHash[j]
|
||||
})
|
||||
|
||||
m := make(map[string]float64)
|
||||
m := make(map[string]interface{})
|
||||
var ts int64
|
||||
var sample models.DataResp
|
||||
var value float64
|
||||
@@ -491,23 +513,37 @@ func GetAnomalyPoint(ruleId int64, ruleQuery models.RuleQuery, seriesTagIndex ma
|
||||
}
|
||||
|
||||
point := common.AnomalyPoint{
|
||||
Key: sample.MetricName(),
|
||||
Labels: sample.Metric,
|
||||
Timestamp: int64(ts),
|
||||
Value: value,
|
||||
Values: values,
|
||||
Severity: trigger.Severity,
|
||||
Triggered: isTriggered,
|
||||
Query: fmt.Sprintf("query:%+v trigger:%+v", ruleQuery.Queries, trigger),
|
||||
Key: sample.MetricName(),
|
||||
Labels: sample.Metric,
|
||||
Timestamp: int64(ts),
|
||||
Value: value,
|
||||
Values: values,
|
||||
Severity: trigger.Severity,
|
||||
Triggered: isTriggered,
|
||||
Query: fmt.Sprintf("query:%+v trigger:%+v", ruleQuery.Queries, trigger),
|
||||
RecoverConfig: trigger.RecoverConfig,
|
||||
}
|
||||
|
||||
if sample.Query != "" {
|
||||
point.Query = sample.Query
|
||||
}
|
||||
|
||||
// 恢复条件判断经过讨论是只在表达式模式下支持,表达式模式会通过 isTriggered 判断是告警点还是恢复点
|
||||
// 1. 不设置恢复判断,满足恢复条件产生 recoverPoint 恢复,无数据不产生 anomalyPoint 恢复
|
||||
// 2. 设置满足条件才恢复,仅可通过产生 recoverPoint 恢复,不能通过不产生 anomalyPoint 恢复
|
||||
// 3. 设置无数据不恢复,仅可通过产生 recoverPoint 恢复,不产生 anomalyPoint 恢复
|
||||
if isTriggered {
|
||||
points = append(points, point)
|
||||
} else {
|
||||
switch trigger.RecoverConfig.JudgeType {
|
||||
case models.Origin:
|
||||
// 对齐原实现 do nothing
|
||||
case models.RecoverOnCondition:
|
||||
// 额外判断恢复条件,满足才恢复
|
||||
fulfill := parser.Calc(trigger.RecoverConfig.RecoverExp, m)
|
||||
if !fulfill {
|
||||
continue
|
||||
}
|
||||
}
|
||||
recoverPoints = append(recoverPoints, point)
|
||||
}
|
||||
}
|
||||
@@ -516,6 +552,144 @@ func GetAnomalyPoint(ruleId int64, ruleQuery models.RuleQuery, seriesTagIndex ma
|
||||
return points, recoverPoints
|
||||
}
|
||||
|
||||
func flatten(rehashed map[uint64][][]uint64) map[uint64][]uint64 {
|
||||
seriesTagIndex := make(map[uint64][]uint64)
|
||||
var i uint64
|
||||
for _, HashTagIndex := range rehashed {
|
||||
for u := range HashTagIndex {
|
||||
seriesTagIndex[i] = HashTagIndex[u]
|
||||
i++
|
||||
}
|
||||
}
|
||||
return seriesTagIndex
|
||||
}
|
||||
|
||||
// onJoin 组合两个经过 rehash 之后的集合
|
||||
// 如查询 A,经过 on data_base rehash 分组后
|
||||
// [[A1{data_base=1, table=alert},A2{data_base=1, table=alert}],[A5{data_base=1, table=board}]]
|
||||
// [[A3{data_base=2, table=board}],[A4{data_base=2, table=alert}]]
|
||||
// 查询 B,经过 on data_base rehash 分组后
|
||||
// [[B1{data_base=1, table=alert}]]
|
||||
// [[B2{data_base=2, table=alert}]]
|
||||
// 内联得到
|
||||
// [[A1{data_base=1, table=alert},A2{data_base=1, table=alert},B1{data_base=1, table=alert}],[A5{data_base=1, table=board},[B1{data_base=1, table=alert}]]
|
||||
// [[A3{data_base=2, table=board},B2{data_base=2, table=alert}],[A4{data_base=2, table=alert},B2{data_base=2, table=alert}]]
|
||||
func onJoin(reHashTagIndex1 map[uint64][][]uint64, reHashTagIndex2 map[uint64][][]uint64, joinType JoinType) map[uint64][][]uint64 {
|
||||
reHashTagIndex := make(map[uint64][][]uint64)
|
||||
for rehash := range reHashTagIndex1 {
|
||||
if _, ok := reHashTagIndex2[rehash]; ok {
|
||||
// 若有 rehash 相同的记录,两两合并
|
||||
for i1 := range reHashTagIndex1[rehash] {
|
||||
for i2 := range reHashTagIndex2[rehash] {
|
||||
reHashTagIndex[rehash] = append(reHashTagIndex[rehash], mergeNewArray(reHashTagIndex1[rehash][i1], reHashTagIndex2[rehash][i2]))
|
||||
}
|
||||
}
|
||||
} else {
|
||||
// 合并方式不为 inner 时,需要保留 reHashTagIndex1 中未匹配的记录
|
||||
if joinType != Inner {
|
||||
reHashTagIndex[rehash] = reHashTagIndex1[rehash]
|
||||
}
|
||||
}
|
||||
}
|
||||
return reHashTagIndex
|
||||
}
|
||||
|
||||
// rehashSet 重新 hash 分组
|
||||
// 如当前查询 A 有五条记录
|
||||
// A1{data_base=1, table=alert}
|
||||
// A2{data_base=1, table=alert}
|
||||
// A3{data_base=2, table=board}
|
||||
// A4{data_base=2, table=alert}
|
||||
// A5{data_base=1, table=board}
|
||||
// 经过预处理(按曲线分组,此步已在进入 GetAnomalyPoint 函数前完成)后,分为 4 组,
|
||||
// [A1{data_base=1, table=alert},A2{data_base=1, table=alert}]
|
||||
// [A3{data_base=2, table=board}]
|
||||
// [A4{data_base=2, table=alert}]
|
||||
// [A5{data_base=1, table=board}]
|
||||
// 若 rehashSet 按 data_base 重新分组,此时会得到按 rehash 值分的二维数组,即不会将 rehash 值相同的记录完全合并
|
||||
// [[A1{data_base=1, table=alert},A2{data_base=1, table=alert}],[A5{data_base=1, table=board}]]
|
||||
// [[A3{data_base=2, table=board}],[A4{data_base=2, table=alert}]]
|
||||
func rehashSet(seriesTagIndex1 map[uint64][]uint64, seriesStore map[uint64]models.DataResp, on []string) map[uint64][][]uint64 {
|
||||
reHashTagIndex := make(map[uint64][][]uint64)
|
||||
for _, seriesHashes := range seriesTagIndex1 {
|
||||
if len(seriesHashes) == 0 {
|
||||
continue
|
||||
}
|
||||
series, exists := seriesStore[seriesHashes[0]]
|
||||
if !exists {
|
||||
continue
|
||||
}
|
||||
|
||||
rehash := hash.GetTargetTagHash(series.Metric, on)
|
||||
if _, ok := reHashTagIndex[rehash]; !ok {
|
||||
reHashTagIndex[rehash] = make([][]uint64, 0)
|
||||
}
|
||||
reHashTagIndex[rehash] = append(reHashTagIndex[rehash], seriesHashes)
|
||||
}
|
||||
return reHashTagIndex
|
||||
}
|
||||
|
||||
// 笛卡尔积,查询的结果两两合并
|
||||
func cartesianJoin(seriesTagIndex1 map[uint64][]uint64, seriesTagIndex2 map[uint64][]uint64) map[uint64][]uint64 {
|
||||
var index uint64
|
||||
seriesTagIndex := make(map[uint64][]uint64)
|
||||
for _, seriesHashes1 := range seriesTagIndex1 {
|
||||
for _, seriesHashes2 := range seriesTagIndex2 {
|
||||
seriesTagIndex[index] = mergeNewArray(seriesHashes1, seriesHashes2)
|
||||
index++
|
||||
}
|
||||
}
|
||||
return seriesTagIndex
|
||||
}
|
||||
|
||||
// noneJoin 直接拼接
|
||||
func noneJoin(seriesTagIndex1 map[uint64][]uint64, seriesTagIndex2 map[uint64][]uint64) map[uint64][]uint64 {
|
||||
seriesTagIndex := make(map[uint64][]uint64)
|
||||
var index uint64
|
||||
for _, seriesHashes := range seriesTagIndex1 {
|
||||
seriesTagIndex[index] = seriesHashes
|
||||
index++
|
||||
}
|
||||
for _, seriesHashes := range seriesTagIndex2 {
|
||||
seriesTagIndex[index] = seriesHashes
|
||||
index++
|
||||
}
|
||||
return seriesTagIndex
|
||||
}
|
||||
|
||||
// originalJoin 原始分组方案,key 相同,即标签全部相同分为一组
|
||||
func originalJoin(seriesTagIndex1 map[uint64][]uint64, seriesTagIndex2 map[uint64][]uint64) map[uint64][]uint64 {
|
||||
seriesTagIndex := make(map[uint64][]uint64)
|
||||
for tagHash, seriesHashes := range seriesTagIndex1 {
|
||||
if _, ok := seriesTagIndex[tagHash]; !ok {
|
||||
seriesTagIndex[tagHash] = mergeNewArray(seriesHashes)
|
||||
} else {
|
||||
seriesTagIndex[tagHash] = append(seriesTagIndex[tagHash], seriesHashes...)
|
||||
}
|
||||
}
|
||||
|
||||
for tagHash, seriesHashes := range seriesTagIndex2 {
|
||||
if _, ok := seriesTagIndex[tagHash]; !ok {
|
||||
seriesTagIndex[tagHash] = mergeNewArray(seriesHashes)
|
||||
} else {
|
||||
seriesTagIndex[tagHash] = append(seriesTagIndex[tagHash], seriesHashes...)
|
||||
}
|
||||
}
|
||||
|
||||
return seriesTagIndex
|
||||
}
|
||||
|
||||
// exclude 左斥,留下在 reHashTagIndex1 中,但不在 reHashTagIndex2 中的记录
|
||||
func exclude(reHashTagIndex1 map[uint64][][]uint64, reHashTagIndex2 map[uint64][][]uint64) map[uint64][][]uint64 {
|
||||
reHashTagIndex := make(map[uint64][][]uint64)
|
||||
for rehash, _ := range reHashTagIndex1 {
|
||||
if _, ok := reHashTagIndex2[rehash]; !ok {
|
||||
reHashTagIndex[rehash] = reHashTagIndex1[rehash]
|
||||
}
|
||||
}
|
||||
return reHashTagIndex
|
||||
}
|
||||
|
||||
func MakeSeriesMap(series []models.DataResp, seriesTagIndex map[uint64][]uint64, seriesStore map[uint64]models.DataResp) {
|
||||
for i := 0; i < len(series); i++ {
|
||||
serieHash := hash.GetHash(series[i].Metric, series[i].Ref)
|
||||
@@ -529,3 +703,108 @@ func MakeSeriesMap(series []models.DataResp, seriesTagIndex map[uint64][]uint64,
|
||||
seriesTagIndex[tagHash] = append(seriesTagIndex[tagHash], serieHash)
|
||||
}
|
||||
}
|
||||
|
||||
func mergeNewArray(arg ...[]uint64) []uint64 {
|
||||
res := make([]uint64, 0)
|
||||
for _, a := range arg {
|
||||
res = append(res, a...)
|
||||
}
|
||||
return res
|
||||
}
|
||||
|
||||
func ProcessJoins(ruleId int64, trigger models.Trigger, seriesTagIndexes map[string]map[uint64][]uint64, seriesStore map[uint64]models.DataResp) map[uint64][]uint64 {
|
||||
last := make(map[uint64][]uint64)
|
||||
if len(seriesTagIndexes) == 0 {
|
||||
return last
|
||||
}
|
||||
|
||||
if len(trigger.Joins) == 0 {
|
||||
idx := 0
|
||||
for _, seriesTagIndex := range seriesTagIndexes {
|
||||
if idx == 0 {
|
||||
last = seriesTagIndex
|
||||
} else {
|
||||
last = originalJoin(last, seriesTagIndex)
|
||||
}
|
||||
idx++
|
||||
}
|
||||
return last
|
||||
}
|
||||
|
||||
// 有 join 条件,按条件依次合并
|
||||
if len(seriesTagIndexes) < len(trigger.Joins)+1 {
|
||||
logger.Errorf("rule_eval rid:%d queries' count: %d not match join condition's count: %d", ruleId, len(seriesTagIndexes), len(trigger.Joins))
|
||||
return nil
|
||||
}
|
||||
|
||||
last = seriesTagIndexes[trigger.JoinRef]
|
||||
lastRehashed := rehashSet(last, seriesStore, trigger.Joins[0].On)
|
||||
for i := range trigger.Joins {
|
||||
cur := seriesTagIndexes[trigger.Joins[i].Ref]
|
||||
switch trigger.Joins[i].JoinType {
|
||||
case "original":
|
||||
last = originalJoin(last, cur)
|
||||
case "none":
|
||||
last = noneJoin(last, cur)
|
||||
case "cartesian":
|
||||
last = cartesianJoin(last, cur)
|
||||
case "inner_join":
|
||||
curRehashed := rehashSet(cur, seriesStore, trigger.Joins[i].On)
|
||||
lastRehashed = onJoin(lastRehashed, curRehashed, Inner)
|
||||
last = flatten(lastRehashed)
|
||||
case "left_join":
|
||||
curRehashed := rehashSet(cur, seriesStore, trigger.Joins[i].On)
|
||||
lastRehashed = onJoin(lastRehashed, curRehashed, Left)
|
||||
last = flatten(lastRehashed)
|
||||
case "right_join":
|
||||
curRehashed := rehashSet(cur, seriesStore, trigger.Joins[i].On)
|
||||
lastRehashed = onJoin(curRehashed, lastRehashed, Right)
|
||||
last = flatten(lastRehashed)
|
||||
case "left_exclude":
|
||||
curRehashed := rehashSet(cur, seriesStore, trigger.Joins[i].On)
|
||||
lastRehashed = exclude(lastRehashed, curRehashed)
|
||||
last = flatten(lastRehashed)
|
||||
case "right_exclude":
|
||||
curRehashed := rehashSet(cur, seriesStore, trigger.Joins[i].On)
|
||||
lastRehashed = exclude(curRehashed, lastRehashed)
|
||||
last = flatten(lastRehashed)
|
||||
default:
|
||||
logger.Warningf("rule_eval rid:%d join type:%s not support", ruleId, trigger.Joins[i].JoinType)
|
||||
}
|
||||
}
|
||||
return last
|
||||
}
|
||||
|
||||
func GetQueryRef(query interface{}) (string, error) {
|
||||
// 首先检查是否为 map
|
||||
if m, ok := query.(map[string]interface{}); ok {
|
||||
if ref, exists := m["ref"]; exists {
|
||||
if refStr, ok := ref.(string); ok {
|
||||
return refStr, nil
|
||||
}
|
||||
return "", fmt.Errorf("ref 字段不是字符串类型")
|
||||
}
|
||||
return "", fmt.Errorf("query 中没有找到 ref 字段")
|
||||
}
|
||||
|
||||
// 如果不是 map,则按原来的方式处理结构体
|
||||
v := reflect.ValueOf(query)
|
||||
if v.Kind() == reflect.Ptr {
|
||||
v = v.Elem()
|
||||
}
|
||||
|
||||
if v.Kind() != reflect.Struct {
|
||||
return "", fmt.Errorf("query not a struct or map")
|
||||
}
|
||||
|
||||
refField := v.FieldByName("Ref")
|
||||
if !refField.IsValid() {
|
||||
return "", fmt.Errorf("not find ref field")
|
||||
}
|
||||
|
||||
if refField.Kind() != reflect.String {
|
||||
return "", fmt.Errorf("ref not a string")
|
||||
}
|
||||
|
||||
return refField.String(), nil
|
||||
}
|
||||
|
||||
271
alert/eval/eval_test.go
Normal file
271
alert/eval/eval_test.go
Normal file
@@ -0,0 +1,271 @@
|
||||
package eval
|
||||
|
||||
import (
|
||||
"reflect"
|
||||
"testing"
|
||||
|
||||
"golang.org/x/exp/slices"
|
||||
)
|
||||
|
||||
var (
|
||||
reHashTagIndex1 = map[uint64][][]uint64{
|
||||
1: {
|
||||
{1, 2}, {3, 4},
|
||||
},
|
||||
2: {
|
||||
{5, 6}, {7, 8},
|
||||
},
|
||||
}
|
||||
reHashTagIndex2 = map[uint64][][]uint64{
|
||||
1: {
|
||||
{9, 10}, {11, 12},
|
||||
},
|
||||
3: {
|
||||
{13, 14}, {15, 16},
|
||||
},
|
||||
}
|
||||
seriesTagIndex1 = map[uint64][]uint64{
|
||||
1: {1, 2, 3, 4},
|
||||
2: {5, 6, 7, 8},
|
||||
}
|
||||
seriesTagIndex2 = map[uint64][]uint64{
|
||||
1: {9, 10, 11, 12},
|
||||
3: {13, 14, 15, 16},
|
||||
}
|
||||
)
|
||||
|
||||
func Test_originalJoin(t *testing.T) {
|
||||
type args struct {
|
||||
seriesTagIndex1 map[uint64][]uint64
|
||||
seriesTagIndex2 map[uint64][]uint64
|
||||
}
|
||||
tests := []struct {
|
||||
name string
|
||||
args args
|
||||
want map[uint64][]uint64
|
||||
}{
|
||||
{
|
||||
name: "original join",
|
||||
args: args{
|
||||
seriesTagIndex1: map[uint64][]uint64{
|
||||
1: {1, 2, 3, 4},
|
||||
2: {5, 6, 7, 8},
|
||||
},
|
||||
seriesTagIndex2: map[uint64][]uint64{
|
||||
1: {9, 10, 11, 12},
|
||||
3: {13, 14, 15, 16},
|
||||
},
|
||||
},
|
||||
want: map[uint64][]uint64{
|
||||
1: {1, 2, 3, 4, 9, 10, 11, 12},
|
||||
2: {5, 6, 7, 8},
|
||||
3: {13, 14, 15, 16},
|
||||
},
|
||||
},
|
||||
}
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
if got := originalJoin(tt.args.seriesTagIndex1, tt.args.seriesTagIndex2); !reflect.DeepEqual(got, tt.want) {
|
||||
t.Errorf("originalJoin() = %v, want %v", got, tt.want)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func Test_exclude(t *testing.T) {
|
||||
type args struct {
|
||||
reHashTagIndex1 map[uint64][][]uint64
|
||||
reHashTagIndex2 map[uint64][][]uint64
|
||||
}
|
||||
tests := []struct {
|
||||
name string
|
||||
args args
|
||||
want map[uint64][]uint64
|
||||
}{
|
||||
{
|
||||
name: "left exclude",
|
||||
args: args{
|
||||
reHashTagIndex1: reHashTagIndex1,
|
||||
reHashTagIndex2: reHashTagIndex2,
|
||||
},
|
||||
want: map[uint64][]uint64{
|
||||
0: {5, 6},
|
||||
1: {7, 8},
|
||||
},
|
||||
},
|
||||
{
|
||||
name: "right exclude",
|
||||
args: args{
|
||||
reHashTagIndex1: reHashTagIndex2,
|
||||
reHashTagIndex2: reHashTagIndex1,
|
||||
},
|
||||
want: map[uint64][]uint64{
|
||||
3: {13, 14},
|
||||
4: {15, 16},
|
||||
},
|
||||
},
|
||||
}
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
if got := exclude(tt.args.reHashTagIndex1, tt.args.reHashTagIndex2); !allValueDeepEqual(flatten(got), tt.want) {
|
||||
t.Errorf("exclude() = %v, want %v", got, tt.want)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func Test_noneJoin(t *testing.T) {
|
||||
type args struct {
|
||||
seriesTagIndex1 map[uint64][]uint64
|
||||
seriesTagIndex2 map[uint64][]uint64
|
||||
}
|
||||
tests := []struct {
|
||||
name string
|
||||
args args
|
||||
want map[uint64][]uint64
|
||||
}{
|
||||
{
|
||||
name: "none join, direct splicing",
|
||||
args: args{
|
||||
seriesTagIndex1: seriesTagIndex1,
|
||||
seriesTagIndex2: seriesTagIndex2,
|
||||
},
|
||||
want: map[uint64][]uint64{
|
||||
0: {1, 2, 3, 4},
|
||||
1: {5, 6, 7, 8},
|
||||
2: {9, 10, 11, 12},
|
||||
3: {13, 14, 15, 16},
|
||||
},
|
||||
},
|
||||
}
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
if got := noneJoin(tt.args.seriesTagIndex1, tt.args.seriesTagIndex2); !allValueDeepEqual(got, tt.want) {
|
||||
t.Errorf("noneJoin() = %v, want %v", got, tt.want)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func Test_cartesianJoin(t *testing.T) {
|
||||
type args struct {
|
||||
seriesTagIndex1 map[uint64][]uint64
|
||||
seriesTagIndex2 map[uint64][]uint64
|
||||
}
|
||||
tests := []struct {
|
||||
name string
|
||||
args args
|
||||
want map[uint64][]uint64
|
||||
}{
|
||||
{
|
||||
name: "cartesian join",
|
||||
args: args{
|
||||
seriesTagIndex1: seriesTagIndex1,
|
||||
seriesTagIndex2: seriesTagIndex2,
|
||||
},
|
||||
want: map[uint64][]uint64{
|
||||
0: {1, 2, 3, 4, 9, 10, 11, 12},
|
||||
1: {5, 6, 7, 8, 9, 10, 11, 12},
|
||||
2: {5, 6, 7, 8, 13, 14, 15, 16},
|
||||
3: {1, 2, 3, 4, 13, 14, 15, 16},
|
||||
},
|
||||
},
|
||||
}
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
if got := cartesianJoin(tt.args.seriesTagIndex1, tt.args.seriesTagIndex2); !allValueDeepEqual(got, tt.want) {
|
||||
t.Errorf("cartesianJoin() = %v, want %v", got, tt.want)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func Test_onJoin(t *testing.T) {
|
||||
type args struct {
|
||||
reHashTagIndex1 map[uint64][][]uint64
|
||||
reHashTagIndex2 map[uint64][][]uint64
|
||||
joinType JoinType
|
||||
}
|
||||
tests := []struct {
|
||||
name string
|
||||
args args
|
||||
want map[uint64][]uint64
|
||||
}{
|
||||
{
|
||||
name: "left join",
|
||||
args: args{
|
||||
reHashTagIndex1: reHashTagIndex1,
|
||||
reHashTagIndex2: reHashTagIndex2,
|
||||
joinType: Left,
|
||||
},
|
||||
want: map[uint64][]uint64{
|
||||
1: {1, 2, 9, 10},
|
||||
2: {3, 4, 9, 10},
|
||||
3: {1, 2, 11, 12},
|
||||
4: {3, 4, 11, 12},
|
||||
5: {5, 6},
|
||||
6: {7, 8},
|
||||
},
|
||||
},
|
||||
{
|
||||
name: "right join",
|
||||
args: args{
|
||||
reHashTagIndex1: reHashTagIndex2,
|
||||
reHashTagIndex2: reHashTagIndex1,
|
||||
joinType: Right,
|
||||
},
|
||||
want: map[uint64][]uint64{
|
||||
1: {1, 2, 9, 10},
|
||||
2: {3, 4, 9, 10},
|
||||
3: {1, 2, 11, 12},
|
||||
4: {3, 4, 11, 12},
|
||||
5: {13, 14},
|
||||
6: {15, 16},
|
||||
},
|
||||
},
|
||||
|
||||
{
|
||||
name: "inner join",
|
||||
args: args{
|
||||
reHashTagIndex1: reHashTagIndex1,
|
||||
reHashTagIndex2: reHashTagIndex2,
|
||||
joinType: Inner,
|
||||
},
|
||||
want: map[uint64][]uint64{
|
||||
1: {1, 2, 9, 10},
|
||||
2: {3, 4, 9, 10},
|
||||
3: {1, 2, 11, 12},
|
||||
4: {3, 4, 11, 12},
|
||||
},
|
||||
},
|
||||
}
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
if got := onJoin(tt.args.reHashTagIndex1, tt.args.reHashTagIndex2, tt.args.joinType); !allValueDeepEqual(flatten(got), tt.want) {
|
||||
t.Errorf("onJoin() = %v, want %v", got, tt.want)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
// allValueDeepEqual 判断 map 的 value 是否相同,不考虑 key
|
||||
func allValueDeepEqual(got, want map[uint64][]uint64) bool {
|
||||
if len(got) != len(want) {
|
||||
return false
|
||||
}
|
||||
for _, v1 := range got {
|
||||
curEqual := false
|
||||
slices.Sort(v1)
|
||||
for _, v2 := range want {
|
||||
slices.Sort(v2)
|
||||
if reflect.DeepEqual(v1, v2) {
|
||||
curEqual = true
|
||||
break
|
||||
}
|
||||
}
|
||||
if !curEqual {
|
||||
return false
|
||||
}
|
||||
}
|
||||
return true
|
||||
}
|
||||
@@ -12,28 +12,28 @@ import (
|
||||
"github.com/toolkits/pkg/logger"
|
||||
)
|
||||
|
||||
func IsMuted(rule *models.AlertRule, event *models.AlertCurEvent, targetCache *memsto.TargetCacheType, alertMuteCache *memsto.AlertMuteCacheType) bool {
|
||||
func IsMuted(rule *models.AlertRule, event *models.AlertCurEvent, targetCache *memsto.TargetCacheType, alertMuteCache *memsto.AlertMuteCacheType) (bool, string) {
|
||||
if rule.Disabled == 1 {
|
||||
return true
|
||||
return true, "rule disabled"
|
||||
}
|
||||
|
||||
if TimeSpanMuteStrategy(rule, event) {
|
||||
return true
|
||||
return true, "rule is not effective for period of time"
|
||||
}
|
||||
|
||||
if IdentNotExistsMuteStrategy(rule, event, targetCache) {
|
||||
return true
|
||||
return true, "ident not exists mute"
|
||||
}
|
||||
|
||||
if BgNotMatchMuteStrategy(rule, event, targetCache) {
|
||||
return true
|
||||
return true, "bg not match mute"
|
||||
}
|
||||
|
||||
if EventMuteStrategy(event, alertMuteCache) {
|
||||
return true
|
||||
return true, "match mute rule"
|
||||
}
|
||||
|
||||
return false
|
||||
return false, ""
|
||||
}
|
||||
|
||||
// TimeSpanMuteStrategy 根据规则配置的告警生效时间段过滤,如果产生的告警不在规则配置的告警生效时间段内,则不告警,即被mute
|
||||
@@ -114,7 +114,7 @@ func BgNotMatchMuteStrategy(rule *models.AlertRule, event *models.AlertCurEvent,
|
||||
target, exists := targetCache.Get(ident)
|
||||
// 对于包含ident的告警事件,check一下ident所属bg和rule所属bg是否相同
|
||||
// 如果告警规则选择了只在本BG生效,那其他BG的机器就不能因此规则产生告警
|
||||
if exists && target.GroupId != rule.GroupId {
|
||||
if exists && !target.MatchGroupId(rule.GroupId) {
|
||||
logger.Debugf("[%s] mute: rule_eval:%d cluster:%s", "BgNotMatchMuteStrategy", rule.Id, event.Cluster)
|
||||
return true
|
||||
}
|
||||
|
||||
@@ -18,6 +18,9 @@ import (
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
"github.com/ccfos/nightingale/v6/pkg/tplx"
|
||||
"github.com/ccfos/nightingale/v6/pushgw/writer"
|
||||
|
||||
"github.com/prometheus/prometheus/prompb"
|
||||
"github.com/toolkits/pkg/logger"
|
||||
"github.com/toolkits/pkg/str"
|
||||
)
|
||||
@@ -50,10 +53,11 @@ type Processor struct {
|
||||
datasourceId int64
|
||||
EngineName string
|
||||
|
||||
rule *models.AlertRule
|
||||
fires *AlertCurEventMap
|
||||
pendings *AlertCurEventMap
|
||||
inhibit bool
|
||||
rule *models.AlertRule
|
||||
fires *AlertCurEventMap
|
||||
pendings *AlertCurEventMap
|
||||
pendingsUseByRecover *AlertCurEventMap
|
||||
inhibit bool
|
||||
|
||||
tagsMap map[string]string
|
||||
tagsArr []string
|
||||
@@ -145,9 +149,10 @@ func (p *Processor) Handle(anomalyPoints []common.AnomalyPoint, from string, inh
|
||||
// 如果 event 被 mute 了,本质也是 fire 的状态,这里无论如何都添加到 alertingKeys 中,防止 fire 的事件自动恢复了
|
||||
hash := event.Hash
|
||||
alertingKeys[hash] = struct{}{}
|
||||
if mute.IsMuted(cachedRule, event, p.TargetCache, p.alertMuteCache) {
|
||||
isMuted, detail := mute.IsMuted(cachedRule, event, p.TargetCache, p.alertMuteCache)
|
||||
if isMuted {
|
||||
p.Stats.CounterMuteTotal.WithLabelValues(event.GroupName).Inc()
|
||||
logger.Debugf("rule_eval:%s event:%v is muted", p.Key(), event)
|
||||
logger.Debugf("rule_eval:%s event:%v is muted, detail:%s", p.Key(), event, detail)
|
||||
continue
|
||||
}
|
||||
|
||||
@@ -165,7 +170,9 @@ func (p *Processor) Handle(anomalyPoints []common.AnomalyPoint, from string, inh
|
||||
p.handleEvent(events)
|
||||
}
|
||||
|
||||
p.HandleRecover(alertingKeys, now, inhibit)
|
||||
if from == "inner" {
|
||||
p.HandleRecover(alertingKeys, now, inhibit)
|
||||
}
|
||||
}
|
||||
|
||||
func (p *Processor) BuildEvent(anomalyPoint common.AnomalyPoint, from string, now int64) *models.AlertCurEvent {
|
||||
@@ -206,6 +213,15 @@ func (p *Processor) BuildEvent(anomalyPoint common.AnomalyPoint, from string, no
|
||||
event.Severity = anomalyPoint.Severity
|
||||
event.ExtraConfig = p.rule.ExtraConfigJSON
|
||||
event.PromQl = anomalyPoint.Query
|
||||
event.RecoverConfig = anomalyPoint.RecoverConfig
|
||||
|
||||
if p.target != "" {
|
||||
if pt, exist := p.TargetCache.Get(p.target); exist {
|
||||
event.Target = pt
|
||||
} else {
|
||||
logger.Infof("Target[ident: %s] doesn't exist in cache.", p.target)
|
||||
}
|
||||
}
|
||||
|
||||
if event.TriggerValues != "" && strings.Count(event.TriggerValues, "$") > 1 {
|
||||
// TriggerValues 有多个变量,将多个变量都放到 TriggerValue 中
|
||||
@@ -217,9 +233,57 @@ func (p *Processor) BuildEvent(anomalyPoint common.AnomalyPoint, from string, no
|
||||
} else {
|
||||
event.LastEvalTime = event.TriggerTime
|
||||
}
|
||||
|
||||
// 生成事件之后,立马进程 relabel 处理
|
||||
Relabel(p.rule, event)
|
||||
return event
|
||||
}
|
||||
|
||||
func Relabel(rule *models.AlertRule, event *models.AlertCurEvent) {
|
||||
if rule == nil {
|
||||
return
|
||||
}
|
||||
|
||||
if len(rule.EventRelabelConfig) == 0 {
|
||||
return
|
||||
}
|
||||
|
||||
// need to keep the original label
|
||||
event.OriginalTags = event.Tags
|
||||
event.OriginalTagsJSON = make([]string, len(event.TagsJSON))
|
||||
|
||||
labels := make([]prompb.Label, len(event.TagsJSON))
|
||||
for i, tag := range event.TagsJSON {
|
||||
label := strings.SplitN(tag, "=", 2)
|
||||
event.OriginalTagsJSON[i] = tag
|
||||
labels[i] = prompb.Label{Name: label[0], Value: label[1]}
|
||||
}
|
||||
|
||||
for i := 0; i < len(rule.EventRelabelConfig); i++ {
|
||||
if rule.EventRelabelConfig[i].Replacement == "" {
|
||||
rule.EventRelabelConfig[i].Replacement = "$1"
|
||||
}
|
||||
|
||||
if rule.EventRelabelConfig[i].Separator == "" {
|
||||
rule.EventRelabelConfig[i].Separator = ";"
|
||||
}
|
||||
|
||||
if rule.EventRelabelConfig[i].Regex == "" {
|
||||
rule.EventRelabelConfig[i].Regex = "(.*)"
|
||||
}
|
||||
}
|
||||
|
||||
// relabel process
|
||||
relabels := writer.Process(labels, rule.EventRelabelConfig...)
|
||||
event.TagsJSON = make([]string, len(relabels))
|
||||
event.TagsMap = make(map[string]string, len(relabels))
|
||||
for i, label := range relabels {
|
||||
event.TagsJSON[i] = fmt.Sprintf("%s=%s", label.Name, label.Value)
|
||||
event.TagsMap[label.Name] = label.Value
|
||||
}
|
||||
event.Tags = strings.Join(event.TagsJSON, ",,")
|
||||
}
|
||||
|
||||
func (p *Processor) HandleRecover(alertingKeys map[string]struct{}, now int64, inhibit bool) {
|
||||
for _, hash := range p.pendings.Keys() {
|
||||
if _, has := alertingKeys[hash]; has {
|
||||
@@ -229,7 +293,7 @@ func (p *Processor) HandleRecover(alertingKeys map[string]struct{}, now int64, i
|
||||
}
|
||||
|
||||
hashArr := make([]string, 0, len(alertingKeys))
|
||||
for hash := range p.fires.GetAll() {
|
||||
for hash, _ := range p.fires.GetAll() {
|
||||
if _, has := alertingKeys[hash]; has {
|
||||
continue
|
||||
}
|
||||
@@ -248,7 +312,7 @@ func (p *Processor) HandleRecoverEvent(hashArr []string, now int64, inhibit bool
|
||||
|
||||
if !inhibit {
|
||||
for _, hash := range hashArr {
|
||||
p.RecoverSingle(hash, now, nil)
|
||||
p.RecoverSingle(false, hash, now, nil)
|
||||
}
|
||||
return
|
||||
}
|
||||
@@ -270,16 +334,17 @@ func (p *Processor) HandleRecoverEvent(hashArr []string, now int64, inhibit bool
|
||||
// hash 对应的恢复事件的被抑制了,把之前的事件删除
|
||||
p.fires.Delete(e.Hash)
|
||||
p.pendings.Delete(e.Hash)
|
||||
models.AlertCurEventDelByHash(p.ctx, e.Hash)
|
||||
eventMap[event.Tags] = *event
|
||||
}
|
||||
}
|
||||
|
||||
for _, event := range eventMap {
|
||||
p.RecoverSingle(event.Hash, now, nil)
|
||||
p.RecoverSingle(false, event.Hash, now, nil)
|
||||
}
|
||||
}
|
||||
|
||||
func (p *Processor) RecoverSingle(hash string, now int64, value *string, values ...string) {
|
||||
func (p *Processor) RecoverSingle(byRecover bool, hash string, now int64, value *string, values ...string) {
|
||||
cachedRule := p.rule
|
||||
if cachedRule == nil {
|
||||
return
|
||||
@@ -289,11 +354,28 @@ func (p *Processor) RecoverSingle(hash string, now int64, value *string, values
|
||||
if !has {
|
||||
return
|
||||
}
|
||||
|
||||
// 如果配置了留观时长,就不能立马恢复了
|
||||
if cachedRule.RecoverDuration > 0 && now-event.LastEvalTime < cachedRule.RecoverDuration {
|
||||
if cachedRule.RecoverDuration > 0 {
|
||||
lastPendingEvent, has := p.pendingsUseByRecover.Get(hash)
|
||||
if !has {
|
||||
// 说明没有产生过异常点,就不需要恢复了
|
||||
logger.Debugf("rule_eval:%s event:%v do not has pending event, not recover", p.Key(), event)
|
||||
return
|
||||
}
|
||||
|
||||
if now-lastPendingEvent.LastEvalTime < cachedRule.RecoverDuration {
|
||||
logger.Debugf("rule_eval:%s event:%v not recover", p.Key(), event)
|
||||
return
|
||||
}
|
||||
}
|
||||
|
||||
// 如果设置了恢复条件,则不能在此处恢复,必须依靠 recoverPoint 来恢复
|
||||
if event.RecoverConfig.JudgeType != models.Origin && !byRecover {
|
||||
logger.Debugf("rule_eval:%s event:%v not recover", p.Key(), event)
|
||||
return
|
||||
}
|
||||
|
||||
if value != nil {
|
||||
event.TriggerValue = *value
|
||||
if len(values) > 0 {
|
||||
@@ -305,6 +387,7 @@ func (p *Processor) RecoverSingle(hash string, now int64, value *string, values
|
||||
// 我确实无法分辨,是prom中有值但是未满足阈值所以没返回,还是prom中确实丢了一些点导致没有数据可以返回,尴尬
|
||||
p.fires.Delete(hash)
|
||||
p.pendings.Delete(hash)
|
||||
p.pendingsUseByRecover.Delete(hash)
|
||||
|
||||
// 可能是因为调整了promql才恢复的,所以事件里边要体现最新的promql,否则用户会比较困惑
|
||||
// 当然,其实rule的各个字段都可能发生变化了,都更新一下吧
|
||||
@@ -324,6 +407,13 @@ func (p *Processor) handleEvent(events []*models.AlertCurEvent) {
|
||||
if event == nil {
|
||||
continue
|
||||
}
|
||||
|
||||
if _, has := p.pendingsUseByRecover.Get(event.Hash); has {
|
||||
p.pendingsUseByRecover.UpdateLastEvalTime(event.Hash, event.LastEvalTime)
|
||||
} else {
|
||||
p.pendingsUseByRecover.Set(event.Hash, event)
|
||||
}
|
||||
|
||||
if p.rule.PromForDuration == 0 {
|
||||
fireEvents = append(fireEvents, event)
|
||||
if severity > event.Severity {
|
||||
@@ -332,7 +422,7 @@ func (p *Processor) handleEvent(events []*models.AlertCurEvent) {
|
||||
continue
|
||||
}
|
||||
|
||||
var preTriggerTime int64
|
||||
var preTriggerTime int64 // 第一个 pending event 的触发时间
|
||||
preEvent, has := p.pendings.Get(event.Hash)
|
||||
if has {
|
||||
p.pendings.UpdateLastEvalTime(event.Hash, event.LastEvalTime)
|
||||
@@ -424,6 +514,7 @@ func (p *Processor) pushEventToQueue(e *models.AlertCurEvent) {
|
||||
|
||||
func (p *Processor) RecoverAlertCurEventFromDb() {
|
||||
p.pendings = NewAlertCurEventMap(nil)
|
||||
p.pendingsUseByRecover = NewAlertCurEventMap(nil)
|
||||
|
||||
curEvents, err := models.AlertCurEventGetByRuleIdAndDsId(p.ctx, p.rule.Id, p.datasourceId)
|
||||
if err != nil {
|
||||
@@ -434,6 +525,7 @@ func (p *Processor) RecoverAlertCurEventFromDb() {
|
||||
}
|
||||
|
||||
fireMap := make(map[string]*models.AlertCurEvent)
|
||||
pendingsUseByRecoverMap := make(map[string]*models.AlertCurEvent)
|
||||
for _, event := range curEvents {
|
||||
if event.Cate == models.HOST {
|
||||
target, exists := p.TargetCache.Get(event.TargetIdent)
|
||||
@@ -444,10 +536,20 @@ func (p *Processor) RecoverAlertCurEventFromDb() {
|
||||
}
|
||||
|
||||
event.DB2Mem()
|
||||
target, exists := p.TargetCache.Get(event.TargetIdent)
|
||||
if exists {
|
||||
event.Target = target
|
||||
}
|
||||
|
||||
fireMap[event.Hash] = event
|
||||
e := *event
|
||||
pendingsUseByRecoverMap[event.Hash] = &e
|
||||
}
|
||||
|
||||
p.fires = NewAlertCurEventMap(fireMap)
|
||||
|
||||
// 修改告警规则,或者进程重启之后,需要重新加载 pendingsUseByRecover
|
||||
p.pendingsUseByRecover = NewAlertCurEventMap(pendingsUseByRecoverMap)
|
||||
}
|
||||
|
||||
func (p *Processor) fillTags(anomalyPoint common.AnomalyPoint) {
|
||||
@@ -523,6 +625,7 @@ func (p *Processor) mayHandleGroup() {
|
||||
func (p *Processor) DeleteProcessEvent(hash string) {
|
||||
p.fires.Delete(hash)
|
||||
p.pendings.Delete(hash)
|
||||
p.pendingsUseByRecover.Delete(hash)
|
||||
}
|
||||
|
||||
func labelMapToArr(m map[string]string) []string {
|
||||
|
||||
@@ -10,6 +10,7 @@ import (
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/prom"
|
||||
"github.com/ccfos/nightingale/v6/pushgw/writer"
|
||||
"github.com/robfig/cron/v3"
|
||||
|
||||
"github.com/toolkits/pkg/logger"
|
||||
"github.com/toolkits/pkg/str"
|
||||
@@ -19,19 +20,35 @@ type RecordRuleContext struct {
|
||||
datasourceId int64
|
||||
quit chan struct{}
|
||||
|
||||
scheduler *cron.Cron
|
||||
rule *models.RecordingRule
|
||||
promClients *prom.PromClientMap
|
||||
stats *astats.Stats
|
||||
}
|
||||
|
||||
func NewRecordRuleContext(rule *models.RecordingRule, datasourceId int64, promClients *prom.PromClientMap, writers *writer.WritersType, stats *astats.Stats) *RecordRuleContext {
|
||||
return &RecordRuleContext{
|
||||
rrc := &RecordRuleContext{
|
||||
datasourceId: datasourceId,
|
||||
quit: make(chan struct{}),
|
||||
rule: rule,
|
||||
promClients: promClients,
|
||||
stats: stats,
|
||||
}
|
||||
|
||||
if rule.CronPattern == "" && rule.PromEvalInterval != 0 {
|
||||
rule.CronPattern = fmt.Sprintf("@every %ds", rule.PromEvalInterval)
|
||||
}
|
||||
|
||||
rrc.scheduler = cron.New(cron.WithSeconds())
|
||||
_, err := rrc.scheduler.AddFunc(rule.CronPattern, func() {
|
||||
rrc.Eval()
|
||||
})
|
||||
|
||||
if err != nil {
|
||||
logger.Errorf("add cron pattern error: %v", err)
|
||||
}
|
||||
|
||||
return rrc
|
||||
}
|
||||
|
||||
func (rrc *RecordRuleContext) Key() string {
|
||||
@@ -39,11 +56,12 @@ func (rrc *RecordRuleContext) Key() string {
|
||||
}
|
||||
|
||||
func (rrc *RecordRuleContext) Hash() string {
|
||||
return str.MD5(fmt.Sprintf("%d_%d_%s_%d",
|
||||
return str.MD5(fmt.Sprintf("%d_%s_%s_%d_%s",
|
||||
rrc.rule.Id,
|
||||
rrc.rule.PromEvalInterval,
|
||||
rrc.rule.CronPattern,
|
||||
rrc.rule.PromQl,
|
||||
rrc.datasourceId,
|
||||
rrc.rule.AppendTags,
|
||||
))
|
||||
}
|
||||
|
||||
@@ -51,23 +69,7 @@ func (rrc *RecordRuleContext) Prepare() {}
|
||||
|
||||
func (rrc *RecordRuleContext) Start() {
|
||||
logger.Infof("eval:%s started", rrc.Key())
|
||||
interval := rrc.rule.PromEvalInterval
|
||||
if interval <= 0 {
|
||||
interval = 10
|
||||
}
|
||||
|
||||
ticker := time.NewTicker(time.Duration(interval) * time.Second)
|
||||
go func() {
|
||||
defer ticker.Stop()
|
||||
for {
|
||||
select {
|
||||
case <-rrc.quit:
|
||||
return
|
||||
case <-ticker.C:
|
||||
rrc.Eval()
|
||||
}
|
||||
}
|
||||
}()
|
||||
rrc.scheduler.Start()
|
||||
}
|
||||
|
||||
func (rrc *RecordRuleContext) Eval() {
|
||||
@@ -109,5 +111,8 @@ func (rrc *RecordRuleContext) Eval() {
|
||||
|
||||
func (rrc *RecordRuleContext) Stop() {
|
||||
logger.Infof("%s stopped", rrc.Key())
|
||||
|
||||
c := rrc.scheduler.Stop()
|
||||
<-c.Done()
|
||||
close(rrc.quit)
|
||||
}
|
||||
|
||||
@@ -34,7 +34,7 @@ func (rt *Router) pushEventToQueue(c *gin.Context) {
|
||||
continue
|
||||
}
|
||||
|
||||
arr := strings.Split(pair, "=")
|
||||
arr := strings.SplitN(pair, "=", 2)
|
||||
if len(arr) != 2 {
|
||||
continue
|
||||
}
|
||||
@@ -129,7 +129,7 @@ func (rt *Router) makeEvent(c *gin.Context) {
|
||||
} else {
|
||||
for _, vector := range events[i].AnomalyPoints {
|
||||
readableString := vector.ReadableValue()
|
||||
go ruleWorker.RecoverSingle(process.Hash(events[i].RuleId, events[i].DatasourceId, vector), vector.Timestamp, &readableString)
|
||||
go ruleWorker.RecoverSingle(false, process.Hash(events[i].RuleId, events[i].DatasourceId, vector), vector.Timestamp, &readableString)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -29,21 +29,24 @@ type (
|
||||
Rule *models.AlertRule
|
||||
Events []*models.AlertCurEvent
|
||||
Stats *astats.Stats
|
||||
BatchSend bool
|
||||
}
|
||||
|
||||
DefaultCallBacker struct{}
|
||||
)
|
||||
|
||||
func BuildCallBackContext(ctx *ctx.Context, callBackURL string, rule *models.AlertRule, events []*models.AlertCurEvent,
|
||||
uids []int64, userCache *memsto.UserCacheType, stats *astats.Stats) CallBackContext {
|
||||
uids []int64, userCache *memsto.UserCacheType, batchSend bool, stats *astats.Stats) CallBackContext {
|
||||
users := userCache.GetByUserIds(uids)
|
||||
|
||||
newCallBackUrl, _ := events[0].ParseURL(callBackURL)
|
||||
return CallBackContext{
|
||||
Ctx: ctx,
|
||||
CallBackURL: callBackURL,
|
||||
CallBackURL: newCallBackUrl,
|
||||
Rule: rule,
|
||||
Events: events,
|
||||
Users: users,
|
||||
BatchSend: batchSend,
|
||||
Stats: stats,
|
||||
}
|
||||
}
|
||||
@@ -95,6 +98,10 @@ func NewCallBacker(
|
||||
// return &MmSender{tpl: tpls[models.Mm]}
|
||||
case models.TelegramDomain:
|
||||
return &TelegramSender{tpl: tpls[models.Telegram]}
|
||||
case models.LarkDomain:
|
||||
return &LarkSender{tpl: tpls[models.Lark]}
|
||||
case models.LarkCardDomain:
|
||||
return &LarkCardSender{tpl: tpls[models.LarkCard]}
|
||||
}
|
||||
|
||||
return nil
|
||||
@@ -107,31 +114,100 @@ func (c *DefaultCallBacker) CallBack(ctx CallBackContext) {
|
||||
|
||||
event := ctx.Events[0]
|
||||
|
||||
if ctx.BatchSend {
|
||||
webhookConf := &models.Webhook{
|
||||
Type: models.RuleCallback,
|
||||
Enable: true,
|
||||
Url: ctx.CallBackURL,
|
||||
Timeout: 5,
|
||||
RetryCount: 3,
|
||||
RetryInterval: 10,
|
||||
Batch: 1000,
|
||||
}
|
||||
|
||||
PushCallbackEvent(ctx.Ctx, webhookConf, event, ctx.Stats)
|
||||
return
|
||||
}
|
||||
|
||||
ctx.Stats.AlertNotifyTotal.WithLabelValues("rule_callback").Inc()
|
||||
resp, code, err := poster.PostJSON(ctx.CallBackURL, 5*time.Second, event, 3)
|
||||
if err != nil {
|
||||
logger.Errorf("event_callback_fail(rule_id=%d url=%s), resp: %s, err: %v, code: %d",
|
||||
event.RuleId, ctx.CallBackURL, string(resp), err, code)
|
||||
logger.Errorf("event_callback_fail(rule_id=%d url=%s), event:%+v, resp: %s, err: %v, code: %d",
|
||||
event.RuleId, ctx.CallBackURL, event, string(resp), err, code)
|
||||
ctx.Stats.AlertNotifyErrorTotal.WithLabelValues("rule_callback").Inc()
|
||||
} else {
|
||||
logger.Infof("event_callback_succ(rule_id=%d url=%s), resp: %s, code: %d",
|
||||
event.RuleId, ctx.CallBackURL, string(resp), code)
|
||||
logger.Infof("event_callback_succ(rule_id=%d url=%s), event:%+v, resp: %s, code: %d",
|
||||
event.RuleId, ctx.CallBackURL, event, string(resp), code)
|
||||
}
|
||||
}
|
||||
|
||||
func doSend(url string, body interface{}, channel string, stats *astats.Stats) {
|
||||
func doSendAndRecord(ctx *ctx.Context, url, token string, body interface{}, channel string,
|
||||
stats *astats.Stats, event *models.AlertCurEvent) {
|
||||
res, err := doSend(url, body, channel, stats)
|
||||
NotifyRecord(ctx, event, channel, token, res, err)
|
||||
}
|
||||
|
||||
func NotifyRecord(ctx *ctx.Context, evt *models.AlertCurEvent, channel, target, res string, err error) {
|
||||
noti := models.NewNotificationRecord(evt, channel, target)
|
||||
if err != nil {
|
||||
noti.SetStatus(models.NotiStatusFailure)
|
||||
noti.SetDetails(err.Error())
|
||||
} else if res != "" {
|
||||
noti.SetDetails(string(res))
|
||||
}
|
||||
|
||||
if !ctx.IsCenter {
|
||||
_, err := poster.PostByUrlsWithResp[int64](ctx, "/v1/n9e/notify-record", noti)
|
||||
if err != nil {
|
||||
logger.Errorf("add noti:%v failed, err: %v", noti, err)
|
||||
}
|
||||
return
|
||||
}
|
||||
|
||||
if err := noti.Add(ctx); err != nil {
|
||||
logger.Errorf("add noti:%v failed, err: %v", noti, err)
|
||||
}
|
||||
}
|
||||
|
||||
func doSend(url string, body interface{}, channel string, stats *astats.Stats) (string, error) {
|
||||
stats.AlertNotifyTotal.WithLabelValues(channel).Inc()
|
||||
|
||||
res, code, err := poster.PostJSON(url, time.Second*5, body, 3)
|
||||
if err != nil {
|
||||
logger.Errorf("%s_sender: result=fail url=%s code=%d error=%v req:%v response=%s", channel, url, code, err, body, string(res))
|
||||
stats.AlertNotifyErrorTotal.WithLabelValues(channel).Inc()
|
||||
} else {
|
||||
logger.Infof("%s_sender: result=succ url=%s code=%d req:%v response=%s", channel, url, code, body, string(res))
|
||||
return "", err
|
||||
}
|
||||
|
||||
logger.Infof("%s_sender: result=succ url=%s code=%d req:%v response=%s", channel, url, code, body, string(res))
|
||||
return string(res), nil
|
||||
}
|
||||
|
||||
type TaskCreateReply struct {
|
||||
Err string `json:"err"`
|
||||
Dat int64 `json:"dat"` // task.id
|
||||
}
|
||||
|
||||
func PushCallbackEvent(ctx *ctx.Context, webhook *models.Webhook, event *models.AlertCurEvent, stats *astats.Stats) {
|
||||
CallbackEventQueueLock.RLock()
|
||||
queue := CallbackEventQueue[webhook.Url]
|
||||
CallbackEventQueueLock.RUnlock()
|
||||
|
||||
if queue == nil {
|
||||
queue = &WebhookQueue{
|
||||
list: NewSafeListLimited(QueueMaxSize),
|
||||
closeCh: make(chan struct{}),
|
||||
}
|
||||
|
||||
CallbackEventQueueLock.Lock()
|
||||
CallbackEventQueue[webhook.Url] = queue
|
||||
CallbackEventQueueLock.Unlock()
|
||||
|
||||
StartConsumer(ctx, queue, webhook.Batch, webhook, stats)
|
||||
}
|
||||
|
||||
succ := queue.list.PushFront(event)
|
||||
if !succ {
|
||||
logger.Warningf("Write channel(%s) full, current channel size: %d event:%v", webhook.Url, queue.list.Len(), event)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1,9 +1,10 @@
|
||||
package sender
|
||||
|
||||
import (
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"html/template"
|
||||
"strings"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
)
|
||||
|
||||
type dingtalkMarkdown struct {
|
||||
@@ -35,13 +36,13 @@ func (ds *DingtalkSender) Send(ctx MessageContext) {
|
||||
return
|
||||
}
|
||||
|
||||
urls, ats := ds.extract(ctx.Users)
|
||||
urls, ats, tokens := ds.extract(ctx.Users)
|
||||
if len(urls) == 0 {
|
||||
return
|
||||
}
|
||||
message := BuildTplMessage(models.Dingtalk, ds.tpl, ctx.Events)
|
||||
|
||||
for _, url := range urls {
|
||||
for i, url := range urls {
|
||||
var body dingtalk
|
||||
// NoAt in url
|
||||
if strings.Contains(url, "noat=1") {
|
||||
@@ -66,7 +67,7 @@ func (ds *DingtalkSender) Send(ctx MessageContext) {
|
||||
}
|
||||
}
|
||||
|
||||
doSend(url, body, models.Dingtalk, ctx.Stats)
|
||||
doSendAndRecord(ctx.Ctx, url, tokens[i], body, models.Dingtalk, ctx.Stats, ctx.Events[0])
|
||||
}
|
||||
}
|
||||
|
||||
@@ -96,15 +97,17 @@ func (ds *DingtalkSender) CallBack(ctx CallBackContext) {
|
||||
body.Markdown.Text = message
|
||||
}
|
||||
|
||||
doSend(ctx.CallBackURL, body, models.Dingtalk, ctx.Stats)
|
||||
doSendAndRecord(ctx.Ctx, ctx.CallBackURL, ctx.CallBackURL, body,
|
||||
"callback", ctx.Stats, ctx.Events[0])
|
||||
|
||||
ctx.Stats.AlertNotifyTotal.WithLabelValues("rule_callback").Inc()
|
||||
}
|
||||
|
||||
// extract urls and ats from Users
|
||||
func (ds *DingtalkSender) extract(users []*models.User) ([]string, []string) {
|
||||
func (ds *DingtalkSender) extract(users []*models.User) ([]string, []string, []string) {
|
||||
urls := make([]string, 0, len(users))
|
||||
ats := make([]string, 0, len(users))
|
||||
tokens := make([]string, 0, len(users))
|
||||
|
||||
for _, user := range users {
|
||||
if user.Phone != "" {
|
||||
@@ -116,7 +119,8 @@ func (ds *DingtalkSender) extract(users []*models.User) ([]string, []string) {
|
||||
url = "https://oapi.dingtalk.com/robot/send?access_token=" + token
|
||||
}
|
||||
urls = append(urls, url)
|
||||
tokens = append(tokens, token)
|
||||
}
|
||||
}
|
||||
return urls, ats
|
||||
return urls, ats, tokens
|
||||
}
|
||||
|
||||
@@ -9,13 +9,14 @@ import (
|
||||
"github.com/ccfos/nightingale/v6/alert/aconf"
|
||||
"github.com/ccfos/nightingale/v6/memsto"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
|
||||
"github.com/toolkits/pkg/logger"
|
||||
|
||||
"gopkg.in/gomail.v2"
|
||||
)
|
||||
|
||||
var mailch chan *gomail.Message
|
||||
var mailch chan *EmailContext
|
||||
|
||||
type EmailSender struct {
|
||||
subjectTpl *template.Template
|
||||
@@ -23,6 +24,11 @@ type EmailSender struct {
|
||||
smtp aconf.SMTPConfig
|
||||
}
|
||||
|
||||
type EmailContext struct {
|
||||
event *models.AlertCurEvent
|
||||
mail *gomail.Message
|
||||
}
|
||||
|
||||
func (es *EmailSender) Send(ctx MessageContext) {
|
||||
if len(ctx.Users) == 0 || len(ctx.Events) == 0 {
|
||||
return
|
||||
@@ -36,7 +42,7 @@ func (es *EmailSender) Send(ctx MessageContext) {
|
||||
subject = ctx.Events[0].RuleName
|
||||
}
|
||||
content := BuildTplMessage(models.Email, es.contentTpl, ctx.Events)
|
||||
es.WriteEmail(subject, content, tos)
|
||||
es.WriteEmail(subject, content, tos, ctx.Events[0])
|
||||
|
||||
ctx.Stats.AlertNotifyTotal.WithLabelValues(models.Email).Add(float64(len(tos)))
|
||||
}
|
||||
@@ -73,7 +79,8 @@ func SendEmail(subject, content string, tos []string, stmp aconf.SMTPConfig) err
|
||||
return nil
|
||||
}
|
||||
|
||||
func (es *EmailSender) WriteEmail(subject, content string, tos []string) {
|
||||
func (es *EmailSender) WriteEmail(subject, content string, tos []string,
|
||||
event *models.AlertCurEvent) {
|
||||
m := gomail.NewMessage()
|
||||
|
||||
m.SetHeader("From", es.smtp.From)
|
||||
@@ -81,7 +88,7 @@ func (es *EmailSender) WriteEmail(subject, content string, tos []string) {
|
||||
m.SetHeader("Subject", subject)
|
||||
m.SetBody("text/html", content)
|
||||
|
||||
mailch <- m
|
||||
mailch <- &EmailContext{event, m}
|
||||
}
|
||||
|
||||
func dialSmtp(d *gomail.Dialer) gomail.SendCloser {
|
||||
@@ -104,22 +111,22 @@ func dialSmtp(d *gomail.Dialer) gomail.SendCloser {
|
||||
|
||||
var mailQuit = make(chan struct{})
|
||||
|
||||
func RestartEmailSender(smtp aconf.SMTPConfig) {
|
||||
func RestartEmailSender(ctx *ctx.Context, smtp aconf.SMTPConfig) {
|
||||
// Notify internal start exit
|
||||
mailQuit <- struct{}{}
|
||||
startEmailSender(smtp)
|
||||
startEmailSender(ctx, smtp)
|
||||
}
|
||||
|
||||
var smtpConfig aconf.SMTPConfig
|
||||
|
||||
func InitEmailSender(ncc *memsto.NotifyConfigCacheType) {
|
||||
mailch = make(chan *gomail.Message, 100000)
|
||||
go updateSmtp(ncc)
|
||||
func InitEmailSender(ctx *ctx.Context, ncc *memsto.NotifyConfigCacheType) {
|
||||
mailch = make(chan *EmailContext, 100000)
|
||||
go updateSmtp(ctx, ncc)
|
||||
smtpConfig = ncc.GetSMTP()
|
||||
startEmailSender(smtpConfig)
|
||||
go startEmailSender(ctx, smtpConfig)
|
||||
}
|
||||
|
||||
func updateSmtp(ncc *memsto.NotifyConfigCacheType) {
|
||||
func updateSmtp(ctx *ctx.Context, ncc *memsto.NotifyConfigCacheType) {
|
||||
for {
|
||||
time.Sleep(1 * time.Minute)
|
||||
smtp := ncc.GetSMTP()
|
||||
@@ -127,15 +134,16 @@ func updateSmtp(ncc *memsto.NotifyConfigCacheType) {
|
||||
smtpConfig.Pass != smtp.Pass || smtpConfig.User != smtp.User || smtpConfig.Port != smtp.Port ||
|
||||
smtpConfig.InsecureSkipVerify != smtp.InsecureSkipVerify { //diff
|
||||
smtpConfig = smtp
|
||||
RestartEmailSender(smtp)
|
||||
RestartEmailSender(ctx, smtp)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func startEmailSender(smtp aconf.SMTPConfig) {
|
||||
func startEmailSender(ctx *ctx.Context, smtp aconf.SMTPConfig) {
|
||||
conf := smtp
|
||||
if conf.Host == "" || conf.Port == 0 {
|
||||
logger.Warning("SMTP configurations invalid")
|
||||
<-mailQuit
|
||||
return
|
||||
}
|
||||
logger.Infof("start email sender... conf.Host:%+v,conf.Port:%+v", conf.Host, conf.Port)
|
||||
@@ -167,7 +175,8 @@ func startEmailSender(smtp aconf.SMTPConfig) {
|
||||
}
|
||||
open = true
|
||||
}
|
||||
if err := gomail.Send(s, m); err != nil {
|
||||
var err error
|
||||
if err = gomail.Send(s, m.mail); err != nil {
|
||||
logger.Errorf("email_sender: failed to send: %s", err)
|
||||
|
||||
// close and retry
|
||||
@@ -184,11 +193,16 @@ func startEmailSender(smtp aconf.SMTPConfig) {
|
||||
}
|
||||
open = true
|
||||
|
||||
if err := gomail.Send(s, m); err != nil {
|
||||
if err = gomail.Send(s, m.mail); err != nil {
|
||||
logger.Errorf("email_sender: failed to retry send: %s", err)
|
||||
}
|
||||
} else {
|
||||
logger.Infof("email_sender: result=succ subject=%v to=%v", m.GetHeader("Subject"), m.GetHeader("To"))
|
||||
logger.Infof("email_sender: result=succ subject=%v to=%v",
|
||||
m.mail.GetHeader("Subject"), m.mail.GetHeader("To"))
|
||||
}
|
||||
|
||||
for _, to := range m.mail.GetHeader("To") {
|
||||
NotifyRecord(ctx, m.event, models.Email, to, "", err)
|
||||
}
|
||||
|
||||
size++
|
||||
|
||||
@@ -54,7 +54,8 @@ func (fs *FeishuSender) CallBack(ctx CallBackContext) {
|
||||
},
|
||||
}
|
||||
|
||||
doSend(ctx.CallBackURL, body, models.Feishu, ctx.Stats)
|
||||
doSendAndRecord(ctx.Ctx, ctx.CallBackURL, ctx.CallBackURL, body, "callback",
|
||||
ctx.Stats, ctx.Events[0])
|
||||
ctx.Stats.AlertNotifyTotal.WithLabelValues("rule_callback").Inc()
|
||||
}
|
||||
|
||||
@@ -62,9 +63,9 @@ func (fs *FeishuSender) Send(ctx MessageContext) {
|
||||
if len(ctx.Users) == 0 || len(ctx.Events) == 0 {
|
||||
return
|
||||
}
|
||||
urls, ats := fs.extract(ctx.Users)
|
||||
urls, ats, tokens := fs.extract(ctx.Users)
|
||||
message := BuildTplMessage(models.Feishu, fs.tpl, ctx.Events)
|
||||
for _, url := range urls {
|
||||
for i, url := range urls {
|
||||
body := feishu{
|
||||
Msgtype: "text",
|
||||
Content: feishuContent{
|
||||
@@ -77,13 +78,14 @@ func (fs *FeishuSender) Send(ctx MessageContext) {
|
||||
IsAtAll: false,
|
||||
}
|
||||
}
|
||||
doSend(url, body, models.Feishu, ctx.Stats)
|
||||
doSendAndRecord(ctx.Ctx, url, tokens[i], body, models.Feishu, ctx.Stats, ctx.Events[0])
|
||||
}
|
||||
}
|
||||
|
||||
func (fs *FeishuSender) extract(users []*models.User) ([]string, []string) {
|
||||
func (fs *FeishuSender) extract(users []*models.User) ([]string, []string, []string) {
|
||||
urls := make([]string, 0, len(users))
|
||||
ats := make([]string, 0, len(users))
|
||||
tokens := make([]string, 0, len(users))
|
||||
|
||||
for _, user := range users {
|
||||
if user.Phone != "" {
|
||||
@@ -95,7 +97,8 @@ func (fs *FeishuSender) extract(users []*models.User) ([]string, []string) {
|
||||
url = "https://open.feishu.cn/open-apis/bot/v2/hook/" + token
|
||||
}
|
||||
urls = append(urls, url)
|
||||
tokens = append(tokens, token)
|
||||
}
|
||||
}
|
||||
return urls, ats
|
||||
return urls, ats, tokens
|
||||
}
|
||||
|
||||
@@ -56,8 +56,8 @@ const (
|
||||
Triggered = "triggered"
|
||||
)
|
||||
|
||||
var (
|
||||
body = feishuCard{
|
||||
func createFeishuCardBody() feishuCard {
|
||||
return feishuCard{
|
||||
feishu: feishu{Msgtype: "interactive"},
|
||||
Card: Cards{
|
||||
Config: Conf{
|
||||
@@ -90,14 +90,28 @@ var (
|
||||
},
|
||||
},
|
||||
}
|
||||
)
|
||||
}
|
||||
|
||||
func (fs *FeishuCardSender) CallBack(ctx CallBackContext) {
|
||||
if len(ctx.Events) == 0 || len(ctx.CallBackURL) == 0 {
|
||||
return
|
||||
}
|
||||
|
||||
ats := ExtractAtsParams(ctx.CallBackURL)
|
||||
message := BuildTplMessage(models.FeishuCard, fs.tpl, ctx.Events)
|
||||
|
||||
if len(ats) > 0 {
|
||||
atTags := ""
|
||||
for _, at := range ats {
|
||||
if strings.Contains(at, "@") {
|
||||
atTags += fmt.Sprintf("<at email=\"%s\" ></at>", at)
|
||||
} else {
|
||||
atTags += fmt.Sprintf("<at id=\"%s\" ></at>", at)
|
||||
}
|
||||
}
|
||||
message = atTags + message
|
||||
}
|
||||
|
||||
color := "red"
|
||||
lowerUnicode := strings.ToLower(message)
|
||||
if strings.Count(lowerUnicode, Recovered) > 0 && strings.Count(lowerUnicode, Triggered) > 0 {
|
||||
@@ -107,6 +121,7 @@ func (fs *FeishuCardSender) CallBack(ctx CallBackContext) {
|
||||
}
|
||||
|
||||
SendTitle := fmt.Sprintf("🔔 %s", ctx.Events[0].RuleName)
|
||||
body := createFeishuCardBody()
|
||||
body.Card.Header.Title.Content = SendTitle
|
||||
body.Card.Header.Template = color
|
||||
body.Card.Elements[0].Text.Content = message
|
||||
@@ -120,14 +135,15 @@ func (fs *FeishuCardSender) CallBack(ctx CallBackContext) {
|
||||
}
|
||||
parsedURL.RawQuery = ""
|
||||
|
||||
doSend(parsedURL.String(), body, models.FeishuCard, ctx.Stats)
|
||||
doSendAndRecord(ctx.Ctx, parsedURL.String(), parsedURL.String(), body, "callback",
|
||||
ctx.Stats, ctx.Events[0])
|
||||
}
|
||||
|
||||
func (fs *FeishuCardSender) Send(ctx MessageContext) {
|
||||
if len(ctx.Users) == 0 || len(ctx.Events) == 0 {
|
||||
return
|
||||
}
|
||||
urls, _ := fs.extract(ctx.Users)
|
||||
urls, tokens := fs.extract(ctx.Users)
|
||||
message := BuildTplMessage(models.FeishuCard, fs.tpl, ctx.Events)
|
||||
color := "red"
|
||||
lowerUnicode := strings.ToLower(message)
|
||||
@@ -138,18 +154,20 @@ func (fs *FeishuCardSender) Send(ctx MessageContext) {
|
||||
}
|
||||
|
||||
SendTitle := fmt.Sprintf("🔔 %s", ctx.Events[0].RuleName)
|
||||
body := createFeishuCardBody()
|
||||
body.Card.Header.Title.Content = SendTitle
|
||||
body.Card.Header.Template = color
|
||||
body.Card.Elements[0].Text.Content = message
|
||||
body.Card.Elements[2].Elements[0].Content = SendTitle
|
||||
for _, url := range urls {
|
||||
doSend(url, body, models.FeishuCard, ctx.Stats)
|
||||
for i, url := range urls {
|
||||
doSendAndRecord(ctx.Ctx, url, tokens[i], body, models.FeishuCard,
|
||||
ctx.Stats, ctx.Events[0])
|
||||
}
|
||||
}
|
||||
|
||||
func (fs *FeishuCardSender) extract(users []*models.User) ([]string, []string) {
|
||||
urls := make([]string, 0, len(users))
|
||||
ats := make([]string, 0)
|
||||
tokens := make([]string, 0, len(users))
|
||||
for i := range users {
|
||||
if token, has := users[i].ExtractToken(models.FeishuCard); has {
|
||||
url := token
|
||||
@@ -157,7 +175,8 @@ func (fs *FeishuCardSender) extract(users []*models.User) ([]string, []string) {
|
||||
url = "https://open.feishu.cn/open-apis/bot/v2/hook/" + strings.TrimSpace(token)
|
||||
}
|
||||
urls = append(urls, url)
|
||||
tokens = append(tokens, token)
|
||||
}
|
||||
}
|
||||
return urls, ats
|
||||
return urls, tokens
|
||||
}
|
||||
|
||||
@@ -77,15 +77,20 @@ func (c *IbexCallBacker) handleIbex(ctx *ctx.Context, url string, event *models.
|
||||
return
|
||||
}
|
||||
|
||||
tpl := c.taskTplCache.Get(id)
|
||||
CallIbex(ctx, id, host, c.taskTplCache, c.targetCache, c.userCache, event)
|
||||
}
|
||||
|
||||
func CallIbex(ctx *ctx.Context, id int64, host string,
|
||||
taskTplCache *memsto.TaskTplCache, targetCache *memsto.TargetCacheType,
|
||||
userCache *memsto.UserCacheType, event *models.AlertCurEvent) {
|
||||
tpl := taskTplCache.Get(id)
|
||||
if tpl == nil {
|
||||
logger.Errorf("event_callback_ibex: no such tpl(%d)", id)
|
||||
return
|
||||
}
|
||||
|
||||
// check perm
|
||||
// tpl.GroupId - host - account 三元组校验权限
|
||||
can, err := canDoIbex(tpl.UpdateBy, tpl, host, c.targetCache, c.userCache)
|
||||
can, err := canDoIbex(tpl.UpdateBy, tpl, host, targetCache, userCache)
|
||||
if err != nil {
|
||||
logger.Errorf("event_callback_ibex: check perm fail: %v", err)
|
||||
return
|
||||
@@ -103,7 +108,7 @@ func (c *IbexCallBacker) handleIbex(ctx *ctx.Context, url string, event *models.
|
||||
continue
|
||||
}
|
||||
|
||||
arr := strings.Split(pair, "=")
|
||||
arr := strings.SplitN(pair, "=", 2)
|
||||
if len(arr) != 2 {
|
||||
continue
|
||||
}
|
||||
@@ -176,10 +181,15 @@ func canDoIbex(username string, tpl *models.TaskTpl, host string, targetCache *m
|
||||
return false, nil
|
||||
}
|
||||
|
||||
return target.GroupId == tpl.GroupId, nil
|
||||
return target.MatchGroupId(tpl.GroupId), nil
|
||||
}
|
||||
|
||||
func TaskAdd(f models.TaskForm, authUser string, isCenter bool) (int64, error) {
|
||||
if storage.Cache == nil {
|
||||
logger.Warning("event_callback_ibex: redis cache is nil")
|
||||
return 0, fmt.Errorf("redis cache is nil")
|
||||
}
|
||||
|
||||
hosts := cleanHosts(f.Hosts)
|
||||
if len(hosts) == 0 {
|
||||
return 0, fmt.Errorf("arg(hosts) empty")
|
||||
|
||||
65
alert/sender/lark.go
Normal file
65
alert/sender/lark.go
Normal file
@@ -0,0 +1,65 @@
|
||||
package sender
|
||||
|
||||
import (
|
||||
"html/template"
|
||||
"strings"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
)
|
||||
|
||||
var (
|
||||
_ CallBacker = (*LarkSender)(nil)
|
||||
)
|
||||
|
||||
type LarkSender struct {
|
||||
tpl *template.Template
|
||||
}
|
||||
|
||||
func (lk *LarkSender) CallBack(ctx CallBackContext) {
|
||||
if len(ctx.Events) == 0 || len(ctx.CallBackURL) == 0 {
|
||||
return
|
||||
}
|
||||
|
||||
body := feishu{
|
||||
Msgtype: "text",
|
||||
Content: feishuContent{
|
||||
Text: BuildTplMessage(models.Lark, lk.tpl, ctx.Events),
|
||||
},
|
||||
}
|
||||
|
||||
doSendAndRecord(ctx.Ctx, ctx.CallBackURL, ctx.CallBackURL, body, "callback",
|
||||
ctx.Stats, ctx.Events[0])
|
||||
ctx.Stats.AlertNotifyTotal.WithLabelValues("rule_callback").Inc()
|
||||
}
|
||||
|
||||
func (lk *LarkSender) Send(ctx MessageContext) {
|
||||
if len(ctx.Users) == 0 || len(ctx.Events) == 0 {
|
||||
return
|
||||
}
|
||||
urls := lk.extract(ctx.Users)
|
||||
message := BuildTplMessage(models.Lark, lk.tpl, ctx.Events)
|
||||
for _, url := range urls {
|
||||
body := feishu{
|
||||
Msgtype: "text",
|
||||
Content: feishuContent{
|
||||
Text: message,
|
||||
},
|
||||
}
|
||||
doSend(url, body, models.Lark, ctx.Stats)
|
||||
}
|
||||
}
|
||||
|
||||
func (lk *LarkSender) extract(users []*models.User) []string {
|
||||
urls := make([]string, 0, len(users))
|
||||
|
||||
for _, user := range users {
|
||||
if token, has := user.ExtractToken(models.Lark); has {
|
||||
url := token
|
||||
if !strings.HasPrefix(token, "https://") && !strings.HasPrefix(token, "http://") {
|
||||
url = "https://open.larksuite.com/open-apis/bot/v2/hook/" + token
|
||||
}
|
||||
urls = append(urls, url)
|
||||
}
|
||||
}
|
||||
return urls
|
||||
}
|
||||
101
alert/sender/larkcard.go
Normal file
101
alert/sender/larkcard.go
Normal file
@@ -0,0 +1,101 @@
|
||||
package sender
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"html/template"
|
||||
"net/url"
|
||||
"strings"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
)
|
||||
|
||||
type LarkCardSender struct {
|
||||
tpl *template.Template
|
||||
}
|
||||
|
||||
func (fs *LarkCardSender) CallBack(ctx CallBackContext) {
|
||||
if len(ctx.Events) == 0 || len(ctx.CallBackURL) == 0 {
|
||||
return
|
||||
}
|
||||
|
||||
ats := ExtractAtsParams(ctx.CallBackURL)
|
||||
message := BuildTplMessage(models.LarkCard, fs.tpl, ctx.Events)
|
||||
|
||||
if len(ats) > 0 {
|
||||
atTags := ""
|
||||
for _, at := range ats {
|
||||
if strings.Contains(at, "@") {
|
||||
atTags += fmt.Sprintf("<at email=\"%s\" ></at>", at)
|
||||
} else {
|
||||
atTags += fmt.Sprintf("<at id=\"%s\" ></at>", at)
|
||||
}
|
||||
}
|
||||
message = atTags + message
|
||||
}
|
||||
|
||||
color := "red"
|
||||
lowerUnicode := strings.ToLower(message)
|
||||
if strings.Count(lowerUnicode, Recovered) > 0 && strings.Count(lowerUnicode, Triggered) > 0 {
|
||||
color = "orange"
|
||||
} else if strings.Count(lowerUnicode, Recovered) > 0 {
|
||||
color = "green"
|
||||
}
|
||||
|
||||
SendTitle := fmt.Sprintf("🔔 %s", ctx.Events[0].RuleName)
|
||||
body := createFeishuCardBody()
|
||||
body.Card.Header.Title.Content = SendTitle
|
||||
body.Card.Header.Template = color
|
||||
body.Card.Elements[0].Text.Content = message
|
||||
body.Card.Elements[2].Elements[0].Content = SendTitle
|
||||
|
||||
// This is to be compatible with the Larkcard interface, if with query string parameters, the request will fail
|
||||
// Remove query parameters from the URL,
|
||||
parsedURL, err := url.Parse(ctx.CallBackURL)
|
||||
if err != nil {
|
||||
return
|
||||
}
|
||||
parsedURL.RawQuery = ""
|
||||
|
||||
doSendAndRecord(ctx.Ctx, ctx.CallBackURL, ctx.CallBackURL, body, "callback",
|
||||
ctx.Stats, ctx.Events[0])
|
||||
}
|
||||
|
||||
func (fs *LarkCardSender) Send(ctx MessageContext) {
|
||||
if len(ctx.Users) == 0 || len(ctx.Events) == 0 {
|
||||
return
|
||||
}
|
||||
urls, _ := fs.extract(ctx.Users)
|
||||
message := BuildTplMessage(models.LarkCard, fs.tpl, ctx.Events)
|
||||
color := "red"
|
||||
lowerUnicode := strings.ToLower(message)
|
||||
if strings.Count(lowerUnicode, Recovered) > 0 && strings.Count(lowerUnicode, Triggered) > 0 {
|
||||
color = "orange"
|
||||
} else if strings.Count(lowerUnicode, Recovered) > 0 {
|
||||
color = "green"
|
||||
}
|
||||
|
||||
SendTitle := fmt.Sprintf("🔔 %s", ctx.Events[0].RuleName)
|
||||
body := createFeishuCardBody()
|
||||
body.Card.Header.Title.Content = SendTitle
|
||||
body.Card.Header.Template = color
|
||||
body.Card.Elements[0].Text.Content = message
|
||||
body.Card.Elements[2].Elements[0].Content = SendTitle
|
||||
for _, url := range urls {
|
||||
doSend(url, body, models.LarkCard, ctx.Stats)
|
||||
}
|
||||
}
|
||||
|
||||
func (fs *LarkCardSender) extract(users []*models.User) ([]string, []string) {
|
||||
urls := make([]string, 0, len(users))
|
||||
ats := make([]string, 0)
|
||||
for i := range users {
|
||||
if token, has := users[i].ExtractToken(models.Lark); has {
|
||||
url := token
|
||||
if !strings.HasPrefix(token, "https://") && !strings.HasPrefix(token, "http://") {
|
||||
url = "https://open.larksuite.com/open-apis/bot/v2/hook/" + strings.TrimSpace(token)
|
||||
}
|
||||
urls = append(urls, url)
|
||||
}
|
||||
}
|
||||
return urls, ats
|
||||
}
|
||||
@@ -7,6 +7,7 @@ import (
|
||||
|
||||
"github.com/ccfos/nightingale/v6/alert/astats"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
|
||||
"github.com/toolkits/pkg/logger"
|
||||
)
|
||||
@@ -38,11 +39,11 @@ func (ms *MmSender) Send(ctx MessageContext) {
|
||||
}
|
||||
message := BuildTplMessage(models.Mm, ms.tpl, ctx.Events)
|
||||
|
||||
SendMM(MatterMostMessage{
|
||||
SendMM(ctx.Ctx, MatterMostMessage{
|
||||
Text: message,
|
||||
Tokens: urls,
|
||||
Stats: ctx.Stats,
|
||||
})
|
||||
}, ctx.Events[0])
|
||||
}
|
||||
|
||||
func (ms *MmSender) CallBack(ctx CallBackContext) {
|
||||
@@ -51,11 +52,11 @@ func (ms *MmSender) CallBack(ctx CallBackContext) {
|
||||
}
|
||||
message := BuildTplMessage(models.Mm, ms.tpl, ctx.Events)
|
||||
|
||||
SendMM(MatterMostMessage{
|
||||
SendMM(ctx.Ctx, MatterMostMessage{
|
||||
Text: message,
|
||||
Tokens: []string{ctx.CallBackURL},
|
||||
Stats: ctx.Stats,
|
||||
})
|
||||
}, ctx.Events[0])
|
||||
|
||||
ctx.Stats.AlertNotifyTotal.WithLabelValues("rule_callback").Inc()
|
||||
}
|
||||
@@ -70,7 +71,7 @@ func (ms *MmSender) extract(users []*models.User) []string {
|
||||
return tokens
|
||||
}
|
||||
|
||||
func SendMM(message MatterMostMessage) {
|
||||
func SendMM(ctx *ctx.Context, message MatterMostMessage, event *models.AlertCurEvent) {
|
||||
for i := 0; i < len(message.Tokens); i++ {
|
||||
u, err := url.Parse(message.Tokens[i])
|
||||
if err != nil {
|
||||
@@ -103,7 +104,7 @@ func SendMM(message MatterMostMessage) {
|
||||
Username: username,
|
||||
Text: txt + message.Text,
|
||||
}
|
||||
doSend(ur, body, models.Mm, message.Stats)
|
||||
doSendAndRecord(ctx, ur, message.Tokens[i], body, models.Mm, message.Stats, event)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -2,26 +2,30 @@ package sender
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"fmt"
|
||||
"os"
|
||||
"os/exec"
|
||||
"time"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/alert/astats"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
|
||||
"github.com/toolkits/pkg/file"
|
||||
"github.com/toolkits/pkg/logger"
|
||||
"github.com/toolkits/pkg/sys"
|
||||
)
|
||||
|
||||
func MayPluginNotify(noticeBytes []byte, notifyScript models.NotifyScript, stats *astats.Stats) {
|
||||
func MayPluginNotify(ctx *ctx.Context, noticeBytes []byte, notifyScript models.NotifyScript,
|
||||
stats *astats.Stats, event *models.AlertCurEvent) {
|
||||
if len(noticeBytes) == 0 {
|
||||
return
|
||||
}
|
||||
alertingCallScript(noticeBytes, notifyScript, stats)
|
||||
alertingCallScript(ctx, noticeBytes, notifyScript, stats, event)
|
||||
}
|
||||
|
||||
func alertingCallScript(stdinBytes []byte, notifyScript models.NotifyScript, stats *astats.Stats) {
|
||||
func alertingCallScript(ctx *ctx.Context, stdinBytes []byte, notifyScript models.NotifyScript,
|
||||
stats *astats.Stats, event *models.AlertCurEvent) {
|
||||
// not enable or no notify.py? do nothing
|
||||
config := notifyScript
|
||||
if !config.Enable || config.Content == "" {
|
||||
@@ -81,6 +85,7 @@ func alertingCallScript(stdinBytes []byte, notifyScript models.NotifyScript, sta
|
||||
}
|
||||
|
||||
err, isTimeout := sys.WrapTimeout(cmd, time.Duration(config.Timeout)*time.Second)
|
||||
NotifyRecord(ctx, event, channel, cmd.String(), "", buildErr(err, isTimeout))
|
||||
|
||||
if isTimeout {
|
||||
if err == nil {
|
||||
@@ -102,3 +107,11 @@ func alertingCallScript(stdinBytes []byte, notifyScript models.NotifyScript, sta
|
||||
|
||||
logger.Infof("event_script_notify_ok: exec %s output: %s", fpath, buf.String())
|
||||
}
|
||||
|
||||
func buildErr(err error, isTimeout bool) error {
|
||||
if err == nil && !isTimeout {
|
||||
return nil
|
||||
} else {
|
||||
return fmt.Errorf("is_timeout: %v, err: %v", isTimeout, err)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -8,6 +8,7 @@ import (
|
||||
"github.com/ccfos/nightingale/v6/alert/astats"
|
||||
"github.com/ccfos/nightingale/v6/memsto"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
)
|
||||
|
||||
type (
|
||||
@@ -22,6 +23,7 @@ type (
|
||||
Rule *models.AlertRule
|
||||
Events []*models.AlertCurEvent
|
||||
Stats *astats.Stats
|
||||
Ctx *ctx.Context
|
||||
}
|
||||
)
|
||||
|
||||
@@ -41,17 +43,23 @@ func NewSender(key string, tpls map[string]*template.Template, smtp ...aconf.SMT
|
||||
return &MmSender{tpl: tpls[models.Mm]}
|
||||
case models.Telegram:
|
||||
return &TelegramSender{tpl: tpls[models.Telegram]}
|
||||
case models.Lark:
|
||||
return &LarkSender{tpl: tpls[models.Lark]}
|
||||
case models.LarkCard:
|
||||
return &LarkCardSender{tpl: tpls[models.LarkCard]}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func BuildMessageContext(rule *models.AlertRule, events []*models.AlertCurEvent, uids []int64, userCache *memsto.UserCacheType, stats *astats.Stats) MessageContext {
|
||||
func BuildMessageContext(ctx *ctx.Context, rule *models.AlertRule, events []*models.AlertCurEvent,
|
||||
uids []int64, userCache *memsto.UserCacheType, stats *astats.Stats) MessageContext {
|
||||
users := userCache.GetByUserIds(uids)
|
||||
return MessageContext{
|
||||
Rule: rule,
|
||||
Events: events,
|
||||
Users: users,
|
||||
Stats: stats,
|
||||
Ctx: ctx,
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -6,6 +6,7 @@ import (
|
||||
|
||||
"github.com/ccfos/nightingale/v6/alert/astats"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
|
||||
"github.com/toolkits/pkg/logger"
|
||||
)
|
||||
@@ -35,11 +36,11 @@ func (ts *TelegramSender) CallBack(ctx CallBackContext) {
|
||||
}
|
||||
|
||||
message := BuildTplMessage(models.Telegram, ts.tpl, ctx.Events)
|
||||
SendTelegram(TelegramMessage{
|
||||
SendTelegram(ctx.Ctx, TelegramMessage{
|
||||
Text: message,
|
||||
Tokens: []string{ctx.CallBackURL},
|
||||
Stats: ctx.Stats,
|
||||
})
|
||||
}, ctx.Events[0])
|
||||
|
||||
ctx.Stats.AlertNotifyTotal.WithLabelValues("rule_callback").Inc()
|
||||
}
|
||||
@@ -51,11 +52,11 @@ func (ts *TelegramSender) Send(ctx MessageContext) {
|
||||
tokens := ts.extract(ctx.Users)
|
||||
message := BuildTplMessage(models.Telegram, ts.tpl, ctx.Events)
|
||||
|
||||
SendTelegram(TelegramMessage{
|
||||
SendTelegram(ctx.Ctx, TelegramMessage{
|
||||
Text: message,
|
||||
Tokens: tokens,
|
||||
Stats: ctx.Stats,
|
||||
})
|
||||
}, ctx.Events[0])
|
||||
}
|
||||
|
||||
func (ts *TelegramSender) extract(users []*models.User) []string {
|
||||
@@ -68,7 +69,7 @@ func (ts *TelegramSender) extract(users []*models.User) []string {
|
||||
return tokens
|
||||
}
|
||||
|
||||
func SendTelegram(message TelegramMessage) {
|
||||
func SendTelegram(ctx *ctx.Context, message TelegramMessage, event *models.AlertCurEvent) {
|
||||
for i := 0; i < len(message.Tokens); i++ {
|
||||
if !strings.Contains(message.Tokens[i], "/") && !strings.HasPrefix(message.Tokens[i], "https://") {
|
||||
logger.Errorf("telegram_sender: result=fail invalid token=%s", message.Tokens[i])
|
||||
@@ -92,6 +93,6 @@ func SendTelegram(message TelegramMessage) {
|
||||
Text: message.Text,
|
||||
}
|
||||
|
||||
doSend(url, body, models.Telegram, message.Stats)
|
||||
doSendAndRecord(ctx, url, message.Tokens[i], body, models.Telegram, message.Stats, event)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -2,70 +2,184 @@ package sender
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"crypto/tls"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"io"
|
||||
"net/http"
|
||||
"sync"
|
||||
"time"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/alert/astats"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
|
||||
"github.com/toolkits/pkg/logger"
|
||||
)
|
||||
|
||||
func SendWebhooks(webhooks []*models.Webhook, event *models.AlertCurEvent, stats *astats.Stats) {
|
||||
for _, conf := range webhooks {
|
||||
if conf.Url == "" || !conf.Enable {
|
||||
continue
|
||||
}
|
||||
bs, err := json.Marshal(event)
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
func sendWebhook(webhook *models.Webhook, event interface{}, stats *astats.Stats) (bool, string, error) {
|
||||
channel := "webhook"
|
||||
if webhook.Type == models.RuleCallback {
|
||||
channel = "callback"
|
||||
}
|
||||
|
||||
bf := bytes.NewBuffer(bs)
|
||||
conf := webhook
|
||||
if conf.Url == "" || !conf.Enable {
|
||||
return false, "", nil
|
||||
}
|
||||
bs, err := json.Marshal(event)
|
||||
if err != nil {
|
||||
logger.Errorf("%s alertingWebhook failed to marshal event:%+v err:%v", channel, event, err)
|
||||
return false, "", err
|
||||
}
|
||||
|
||||
req, err := http.NewRequest("POST", conf.Url, bf)
|
||||
if err != nil {
|
||||
logger.Warning("alertingWebhook failed to new request", err)
|
||||
continue
|
||||
}
|
||||
bf := bytes.NewBuffer(bs)
|
||||
|
||||
req.Header.Set("Content-Type", "application/json")
|
||||
if conf.BasicAuthUser != "" && conf.BasicAuthPass != "" {
|
||||
req.SetBasicAuth(conf.BasicAuthUser, conf.BasicAuthPass)
|
||||
}
|
||||
req, err := http.NewRequest("POST", conf.Url, bf)
|
||||
if err != nil {
|
||||
logger.Warningf("%s alertingWebhook failed to new reques event:%s err:%v", channel, string(bs), err)
|
||||
return true, "", err
|
||||
}
|
||||
|
||||
if len(conf.Headers) > 0 && len(conf.Headers)%2 == 0 {
|
||||
for i := 0; i < len(conf.Headers); i += 2 {
|
||||
if conf.Headers[i] == "host" || conf.Headers[i] == "Host" {
|
||||
req.Host = conf.Headers[i+1]
|
||||
continue
|
||||
}
|
||||
req.Header.Set(conf.Headers[i], conf.Headers[i+1])
|
||||
req.Header.Set("Content-Type", "application/json")
|
||||
if conf.BasicAuthUser != "" && conf.BasicAuthPass != "" {
|
||||
req.SetBasicAuth(conf.BasicAuthUser, conf.BasicAuthPass)
|
||||
}
|
||||
|
||||
if len(conf.Headers) > 0 && len(conf.Headers)%2 == 0 {
|
||||
for i := 0; i < len(conf.Headers); i += 2 {
|
||||
if conf.Headers[i] == "host" || conf.Headers[i] == "Host" {
|
||||
req.Host = conf.Headers[i+1]
|
||||
continue
|
||||
}
|
||||
req.Header.Set(conf.Headers[i], conf.Headers[i+1])
|
||||
}
|
||||
}
|
||||
insecureSkipVerify := false
|
||||
if webhook != nil {
|
||||
insecureSkipVerify = webhook.SkipVerify
|
||||
}
|
||||
client := http.Client{
|
||||
Timeout: time.Duration(conf.Timeout) * time.Second,
|
||||
Transport: &http.Transport{
|
||||
TLSClientConfig: &tls.Config{InsecureSkipVerify: insecureSkipVerify},
|
||||
},
|
||||
}
|
||||
|
||||
// todo add skip verify
|
||||
client := http.Client{
|
||||
Timeout: time.Duration(conf.Timeout) * time.Second,
|
||||
stats.AlertNotifyTotal.WithLabelValues(channel).Inc()
|
||||
var resp *http.Response
|
||||
var body []byte
|
||||
resp, err = client.Do(req)
|
||||
|
||||
if err != nil {
|
||||
stats.AlertNotifyErrorTotal.WithLabelValues(channel).Inc()
|
||||
logger.Errorf("event_%s_fail, event:%s, url: [%s], error: [%s]", channel, string(bs), conf.Url, err)
|
||||
return true, "", err
|
||||
}
|
||||
|
||||
if resp.Body != nil {
|
||||
defer resp.Body.Close()
|
||||
body, _ = io.ReadAll(resp.Body)
|
||||
}
|
||||
|
||||
if resp.StatusCode == 429 {
|
||||
logger.Errorf("event_%s_fail, url: %s, response code: %d, body: %s event:%s", channel, conf.Url, resp.StatusCode, string(body), string(bs))
|
||||
return true, string(body), fmt.Errorf("status code is 429")
|
||||
}
|
||||
|
||||
logger.Debugf("event_%s_succ, url: %s, response code: %d, body: %s event:%s", channel, conf.Url, resp.StatusCode, string(body), string(bs))
|
||||
return false, string(body), nil
|
||||
}
|
||||
|
||||
func SingleSendWebhooks(ctx *ctx.Context, webhooks []*models.Webhook, event *models.AlertCurEvent, stats *astats.Stats) {
|
||||
for _, conf := range webhooks {
|
||||
retryCount := 0
|
||||
for retryCount < 3 {
|
||||
needRetry, res, err := sendWebhook(conf, event, stats)
|
||||
NotifyRecord(ctx, event, "webhook", conf.Url, res, err)
|
||||
if !needRetry {
|
||||
break
|
||||
}
|
||||
retryCount++
|
||||
time.Sleep(time.Minute * 1 * time.Duration(retryCount))
|
||||
}
|
||||
|
||||
stats.AlertNotifyTotal.WithLabelValues("webhook").Inc()
|
||||
var resp *http.Response
|
||||
resp, err = client.Do(req)
|
||||
if err != nil {
|
||||
stats.AlertNotifyErrorTotal.WithLabelValues("webhook").Inc()
|
||||
logger.Errorf("event_webhook_fail, ruleId: [%d], eventId: [%d], url: [%s], error: [%s]", event.RuleId, event.Id, conf.Url, err)
|
||||
continue
|
||||
}
|
||||
|
||||
var body []byte
|
||||
if resp.Body != nil {
|
||||
defer resp.Body.Close()
|
||||
body, _ = io.ReadAll(resp.Body)
|
||||
}
|
||||
|
||||
logger.Debugf("event_webhook_succ, url: %s, response code: %d, body: %s event:%+v", conf.Url, resp.StatusCode, string(body), event)
|
||||
}
|
||||
}
|
||||
|
||||
func BatchSendWebhooks(ctx *ctx.Context, webhooks []*models.Webhook, event *models.AlertCurEvent, stats *astats.Stats) {
|
||||
for _, conf := range webhooks {
|
||||
logger.Infof("push event:%+v to queue:%v", event, conf)
|
||||
PushEvent(ctx, conf, event, stats)
|
||||
}
|
||||
}
|
||||
|
||||
var EventQueue = make(map[string]*WebhookQueue)
|
||||
var CallbackEventQueue = make(map[string]*WebhookQueue)
|
||||
var CallbackEventQueueLock sync.RWMutex
|
||||
var EventQueueLock sync.RWMutex
|
||||
|
||||
const QueueMaxSize = 100000
|
||||
|
||||
type WebhookQueue struct {
|
||||
list *SafeListLimited
|
||||
closeCh chan struct{}
|
||||
}
|
||||
|
||||
func PushEvent(ctx *ctx.Context, webhook *models.Webhook, event *models.AlertCurEvent, stats *astats.Stats) {
|
||||
EventQueueLock.RLock()
|
||||
queue := EventQueue[webhook.Url]
|
||||
EventQueueLock.RUnlock()
|
||||
|
||||
if queue == nil {
|
||||
queue = &WebhookQueue{
|
||||
list: NewSafeListLimited(QueueMaxSize),
|
||||
closeCh: make(chan struct{}),
|
||||
}
|
||||
|
||||
EventQueueLock.Lock()
|
||||
EventQueue[webhook.Url] = queue
|
||||
EventQueueLock.Unlock()
|
||||
|
||||
StartConsumer(ctx, queue, webhook.Batch, webhook, stats)
|
||||
}
|
||||
|
||||
succ := queue.list.PushFront(event)
|
||||
if !succ {
|
||||
stats.AlertNotifyErrorTotal.WithLabelValues("push_event_queue").Inc()
|
||||
logger.Warningf("Write channel(%s) full, current channel size: %d event:%v", webhook.Url, queue.list.Len(), event)
|
||||
}
|
||||
}
|
||||
|
||||
func StartConsumer(ctx *ctx.Context, queue *WebhookQueue, popSize int, webhook *models.Webhook, stats *astats.Stats) {
|
||||
for {
|
||||
select {
|
||||
case <-queue.closeCh:
|
||||
logger.Infof("event queue:%v closed", queue)
|
||||
return
|
||||
default:
|
||||
events := queue.list.PopBack(popSize)
|
||||
if len(events) == 0 {
|
||||
time.Sleep(time.Millisecond * 400)
|
||||
continue
|
||||
}
|
||||
|
||||
retryCount := 0
|
||||
for retryCount < webhook.RetryCount {
|
||||
needRetry, res, err := sendWebhook(webhook, events, stats)
|
||||
go RecordEvents(ctx, webhook, events, stats, res, err)
|
||||
if !needRetry {
|
||||
break
|
||||
}
|
||||
retryCount++
|
||||
time.Sleep(time.Second * time.Duration(webhook.RetryInterval) * time.Duration(retryCount))
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func RecordEvents(ctx *ctx.Context, webhook *models.Webhook, events []*models.AlertCurEvent, stats *astats.Stats, res string, err error) {
|
||||
for _, event := range events {
|
||||
time.Sleep(time.Millisecond * 10)
|
||||
NotifyRecord(ctx, event, "webhook", webhook.Url, res, err)
|
||||
}
|
||||
}
|
||||
|
||||
111
alert/sender/webhook_queue.go
Normal file
111
alert/sender/webhook_queue.go
Normal file
@@ -0,0 +1,111 @@
|
||||
package sender
|
||||
|
||||
import (
|
||||
"container/list"
|
||||
"sync"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
)
|
||||
|
||||
type SafeList struct {
|
||||
sync.RWMutex
|
||||
L *list.List
|
||||
}
|
||||
|
||||
func NewSafeList() *SafeList {
|
||||
return &SafeList{L: list.New()}
|
||||
}
|
||||
|
||||
func (sl *SafeList) PushFront(v interface{}) *list.Element {
|
||||
sl.Lock()
|
||||
e := sl.L.PushFront(v)
|
||||
sl.Unlock()
|
||||
return e
|
||||
}
|
||||
|
||||
func (sl *SafeList) PushFrontBatch(vs []interface{}) {
|
||||
sl.Lock()
|
||||
for _, item := range vs {
|
||||
sl.L.PushFront(item)
|
||||
}
|
||||
sl.Unlock()
|
||||
}
|
||||
|
||||
func (sl *SafeList) PopBack(max int) []*models.AlertCurEvent {
|
||||
sl.Lock()
|
||||
|
||||
count := sl.L.Len()
|
||||
if count == 0 {
|
||||
sl.Unlock()
|
||||
return []*models.AlertCurEvent{}
|
||||
}
|
||||
|
||||
if count > max {
|
||||
count = max
|
||||
}
|
||||
|
||||
items := make([]*models.AlertCurEvent, 0, count)
|
||||
for i := 0; i < count; i++ {
|
||||
item := sl.L.Remove(sl.L.Back())
|
||||
sample, ok := item.(*models.AlertCurEvent)
|
||||
if ok {
|
||||
items = append(items, sample)
|
||||
}
|
||||
}
|
||||
|
||||
sl.Unlock()
|
||||
return items
|
||||
}
|
||||
|
||||
func (sl *SafeList) RemoveAll() {
|
||||
sl.Lock()
|
||||
sl.L.Init()
|
||||
sl.Unlock()
|
||||
}
|
||||
|
||||
func (sl *SafeList) Len() int {
|
||||
sl.RLock()
|
||||
size := sl.L.Len()
|
||||
sl.RUnlock()
|
||||
return size
|
||||
}
|
||||
|
||||
// SafeList with Limited Size
|
||||
type SafeListLimited struct {
|
||||
maxSize int
|
||||
SL *SafeList
|
||||
}
|
||||
|
||||
func NewSafeListLimited(maxSize int) *SafeListLimited {
|
||||
return &SafeListLimited{SL: NewSafeList(), maxSize: maxSize}
|
||||
}
|
||||
|
||||
func (sll *SafeListLimited) PopBack(max int) []*models.AlertCurEvent {
|
||||
return sll.SL.PopBack(max)
|
||||
}
|
||||
|
||||
func (sll *SafeListLimited) PushFront(v interface{}) bool {
|
||||
if sll.SL.Len() >= sll.maxSize {
|
||||
return false
|
||||
}
|
||||
|
||||
sll.SL.PushFront(v)
|
||||
return true
|
||||
}
|
||||
|
||||
func (sll *SafeListLimited) PushFrontBatch(vs []interface{}) bool {
|
||||
if sll.SL.Len() >= sll.maxSize {
|
||||
return false
|
||||
}
|
||||
|
||||
sll.SL.PushFrontBatch(vs)
|
||||
return true
|
||||
}
|
||||
|
||||
func (sll *SafeListLimited) RemoveAll() {
|
||||
sll.SL.RemoveAll()
|
||||
}
|
||||
|
||||
func (sll *SafeListLimited) Len() int {
|
||||
return sll.SL.Len()
|
||||
}
|
||||
@@ -37,7 +37,8 @@ func (ws *WecomSender) CallBack(ctx CallBackContext) {
|
||||
},
|
||||
}
|
||||
|
||||
doSend(ctx.CallBackURL, body, models.Wecom, ctx.Stats)
|
||||
doSendAndRecord(ctx.Ctx, ctx.CallBackURL, ctx.CallBackURL, body, "callback",
|
||||
ctx.Stats, ctx.Events[0])
|
||||
ctx.Stats.AlertNotifyTotal.WithLabelValues("rule_callback").Inc()
|
||||
}
|
||||
|
||||
@@ -45,21 +46,22 @@ func (ws *WecomSender) Send(ctx MessageContext) {
|
||||
if len(ctx.Users) == 0 || len(ctx.Events) == 0 {
|
||||
return
|
||||
}
|
||||
urls := ws.extract(ctx.Users)
|
||||
urls, tokens := ws.extract(ctx.Users)
|
||||
message := BuildTplMessage(models.Wecom, ws.tpl, ctx.Events)
|
||||
for _, url := range urls {
|
||||
for i, url := range urls {
|
||||
body := wecom{
|
||||
Msgtype: "markdown",
|
||||
Markdown: wecomMarkdown{
|
||||
Content: message,
|
||||
},
|
||||
}
|
||||
doSend(url, body, models.Wecom, ctx.Stats)
|
||||
doSendAndRecord(ctx.Ctx, url, tokens[i], body, models.Wecom, ctx.Stats, ctx.Events[0])
|
||||
}
|
||||
}
|
||||
|
||||
func (ws *WecomSender) extract(users []*models.User) []string {
|
||||
func (ws *WecomSender) extract(users []*models.User) ([]string, []string) {
|
||||
urls := make([]string, 0, len(users))
|
||||
tokens := make([]string, 0, len(users))
|
||||
for _, user := range users {
|
||||
if token, has := user.ExtractToken(models.Wecom); has {
|
||||
url := token
|
||||
@@ -67,7 +69,8 @@ func (ws *WecomSender) extract(users []*models.User) []string {
|
||||
url = "https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=" + token
|
||||
}
|
||||
urls = append(urls, url)
|
||||
tokens = append(tokens, token)
|
||||
}
|
||||
}
|
||||
return urls
|
||||
return urls, tokens
|
||||
}
|
||||
|
||||
@@ -13,6 +13,8 @@ type Center struct {
|
||||
UseFileAssets bool
|
||||
FlashDuty FlashDuty
|
||||
EventHistoryGroupView bool
|
||||
CleanNotifyRecordDay int
|
||||
MigrateBusiGroupLabel bool
|
||||
}
|
||||
|
||||
type Plugin struct {
|
||||
|
||||
@@ -18,20 +18,28 @@ var MetricDesc MetricDescType
|
||||
// GetMetricDesc , if metric is not registered, empty string will be returned
|
||||
func GetMetricDesc(lang, metric string) string {
|
||||
var m map[string]string
|
||||
if lang == "zh" {
|
||||
m = MetricDesc.Zh
|
||||
} else {
|
||||
|
||||
switch lang {
|
||||
case "en":
|
||||
m = MetricDesc.En
|
||||
default:
|
||||
m = MetricDesc.Zh
|
||||
}
|
||||
|
||||
if m != nil {
|
||||
if desc, has := m[metric]; has {
|
||||
if desc, ok := m[metric]; ok {
|
||||
return desc
|
||||
}
|
||||
}
|
||||
|
||||
return MetricDesc.CommonDesc[metric]
|
||||
}
|
||||
if MetricDesc.CommonDesc != nil {
|
||||
if desc, ok := MetricDesc.CommonDesc[metric]; ok {
|
||||
return desc
|
||||
}
|
||||
}
|
||||
|
||||
return ""
|
||||
}
|
||||
func LoadMetricsYaml(configDir, metricsYamlFile string) error {
|
||||
fp := metricsYamlFile
|
||||
if fp == "" {
|
||||
|
||||
@@ -78,6 +78,7 @@ ops:
|
||||
- "/dashboards/del"
|
||||
- "/embedded-dashboards/put"
|
||||
- "/embedded-dashboards"
|
||||
- "/public-dashboards"
|
||||
|
||||
- name: alert
|
||||
cname: 告警规则
|
||||
|
||||
@@ -16,6 +16,7 @@ import (
|
||||
centerrt "github.com/ccfos/nightingale/v6/center/router"
|
||||
"github.com/ccfos/nightingale/v6/center/sso"
|
||||
"github.com/ccfos/nightingale/v6/conf"
|
||||
"github.com/ccfos/nightingale/v6/cron"
|
||||
"github.com/ccfos/nightingale/v6/dumper"
|
||||
"github.com/ccfos/nightingale/v6/memsto"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
@@ -47,6 +48,10 @@ func Initialize(configDir string, cryptoKey string) (func(), error) {
|
||||
|
||||
cconf.MergeOperationConf()
|
||||
|
||||
if config.Alert.Heartbeat.EngineName == "" {
|
||||
config.Alert.Heartbeat.EngineName = "default"
|
||||
}
|
||||
|
||||
logxClean, err := logx.Init(config.Log)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
@@ -71,7 +76,7 @@ func Initialize(configDir string, cryptoKey string) (func(), error) {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
integration.Init(ctx, config.Center.BuiltinIntegrationsDir)
|
||||
go integration.Init(ctx, config.Center.BuiltinIntegrationsDir)
|
||||
var redis storage.Redis
|
||||
redis, err = storage.NewRedis(config.Redis)
|
||||
if err != nil {
|
||||
@@ -94,6 +99,7 @@ func Initialize(configDir string, cryptoKey string) (func(), error) {
|
||||
userCache := memsto.NewUserCache(ctx, syncStats)
|
||||
userGroupCache := memsto.NewUserGroupCache(ctx, syncStats)
|
||||
taskTplCache := memsto.NewTaskTplCache(ctx)
|
||||
configCvalCache := memsto.NewCvalCache(ctx, syncStats)
|
||||
|
||||
sso := sso.Init(config.Center, ctx, configCache)
|
||||
promClients := prom.NewPromClient(ctx)
|
||||
@@ -106,12 +112,21 @@ func Initialize(configDir string, cryptoKey string) (func(), error) {
|
||||
|
||||
go version.GetGithubVersion()
|
||||
|
||||
go cron.CleanNotifyRecord(ctx, config.Center.CleanNotifyRecordDay)
|
||||
|
||||
alertrtRouter := alertrt.New(config.HTTP, config.Alert, alertMuteCache, targetCache, busiGroupCache, alertStats, ctx, externalProcessors)
|
||||
centerRouter := centerrt.New(config.HTTP, config.Center, config.Alert, cconf.Operations, dsCache, notifyConfigCache, promClients, tdengineClients,
|
||||
centerRouter := centerrt.New(config.HTTP, config.Center, config.Alert, config.Ibex,
|
||||
cconf.Operations, dsCache, notifyConfigCache, promClients, tdengineClients,
|
||||
redis, sso, ctx, metas, idents, targetCache, userCache, userGroupCache)
|
||||
pushgwRouter := pushgwrt.New(config.HTTP, config.Pushgw, config.Alert, targetCache, busiGroupCache, idents, metas, writers, ctx)
|
||||
|
||||
r := httpx.GinEngine(config.Global.RunMode, config.HTTP)
|
||||
go func() {
|
||||
if models.CanMigrateBg(ctx) {
|
||||
models.MigrateBg(ctx, pushgwRouter.Pushgw.BusiGroupLabelKey)
|
||||
}
|
||||
}()
|
||||
|
||||
r := httpx.GinEngine(config.Global.RunMode, config.HTTP, configCvalCache.PrintBodyPaths, configCvalCache.PrintAccessLog)
|
||||
|
||||
centerRouter.Config(r)
|
||||
alertrtRouter.Config(r)
|
||||
|
||||
@@ -16,6 +16,12 @@ import (
|
||||
const SYSTEM = "system"
|
||||
|
||||
func Init(ctx *ctx.Context, builtinIntegrationsDir string) {
|
||||
err := models.InitBuiltinPayloads(ctx)
|
||||
if err != nil {
|
||||
logger.Warning("init old builtinPayloads fail ", err)
|
||||
return
|
||||
}
|
||||
|
||||
fp := builtinIntegrationsDir
|
||||
if fp == "" {
|
||||
fp = path.Join(runner.Cwd, "integrations")
|
||||
@@ -92,6 +98,7 @@ func Init(ctx *ctx.Context, builtinIntegrationsDir string) {
|
||||
logger.Warning("update builtin component fail ", old, err)
|
||||
}
|
||||
}
|
||||
component.ID = old.ID
|
||||
}
|
||||
|
||||
// delete uuid is emtpy
|
||||
@@ -141,13 +148,13 @@ func Init(ctx *ctx.Context, builtinIntegrationsDir string) {
|
||||
|
||||
cate := strings.Replace(f, ".json", "", -1)
|
||||
builtinAlert := models.BuiltinPayload{
|
||||
Component: component.Ident,
|
||||
Type: "alert",
|
||||
Cate: cate,
|
||||
Name: alert.Name,
|
||||
Tags: alert.AppendTags,
|
||||
Content: string(content),
|
||||
UUID: alert.UUID,
|
||||
ComponentID: component.ID,
|
||||
Type: "alert",
|
||||
Cate: cate,
|
||||
Name: alert.Name,
|
||||
Tags: alert.AppendTags,
|
||||
Content: string(content),
|
||||
UUID: alert.UUID,
|
||||
}
|
||||
|
||||
old, err := models.BuiltinPayloadGet(ctx, "uuid = ?", alert.UUID)
|
||||
@@ -165,6 +172,7 @@ func Init(ctx *ctx.Context, builtinIntegrationsDir string) {
|
||||
}
|
||||
|
||||
if old.UpdatedBy == SYSTEM {
|
||||
old.ComponentID = component.ID
|
||||
old.Content = string(content)
|
||||
old.Name = alert.Name
|
||||
old.Tags = alert.AppendTags
|
||||
@@ -231,13 +239,13 @@ func Init(ctx *ctx.Context, builtinIntegrationsDir string) {
|
||||
}
|
||||
|
||||
builtinDashboard := models.BuiltinPayload{
|
||||
Component: component.Ident,
|
||||
Type: "dashboard",
|
||||
Cate: "",
|
||||
Name: dashboard.Name,
|
||||
Tags: dashboard.Tags,
|
||||
Content: string(content),
|
||||
UUID: dashboard.UUID,
|
||||
ComponentID: component.ID,
|
||||
Type: "dashboard",
|
||||
Cate: "",
|
||||
Name: dashboard.Name,
|
||||
Tags: dashboard.Tags,
|
||||
Content: string(content),
|
||||
UUID: dashboard.UUID,
|
||||
}
|
||||
|
||||
old, err := models.BuiltinPayloadGet(ctx, "uuid = ?", dashboard.UUID)
|
||||
@@ -255,6 +263,7 @@ func Init(ctx *ctx.Context, builtinIntegrationsDir string) {
|
||||
}
|
||||
|
||||
if old.UpdatedBy == SYSTEM {
|
||||
old.ComponentID = component.ID
|
||||
old.Content = string(content)
|
||||
old.Name = dashboard.Name
|
||||
old.Tags = dashboard.Tags
|
||||
|
||||
@@ -13,8 +13,10 @@ import (
|
||||
"github.com/ccfos/nightingale/v6/center/cstats"
|
||||
"github.com/ccfos/nightingale/v6/center/metas"
|
||||
"github.com/ccfos/nightingale/v6/center/sso"
|
||||
"github.com/ccfos/nightingale/v6/conf"
|
||||
_ "github.com/ccfos/nightingale/v6/front/statik"
|
||||
"github.com/ccfos/nightingale/v6/memsto"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/aop"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
"github.com/ccfos/nightingale/v6/pkg/httpx"
|
||||
@@ -34,6 +36,7 @@ import (
|
||||
type Router struct {
|
||||
HTTP httpx.Config
|
||||
Center cconf.Center
|
||||
Ibex conf.Ibex
|
||||
Alert aconf.Alert
|
||||
Operations cconf.Operation
|
||||
DatasourceCache *memsto.DatasourceCacheType
|
||||
@@ -48,13 +51,20 @@ type Router struct {
|
||||
UserCache *memsto.UserCacheType
|
||||
UserGroupCache *memsto.UserGroupCacheType
|
||||
Ctx *ctx.Context
|
||||
HeartbeatHook HeartbeatHookFunc
|
||||
TargetDeleteHook models.TargetDeleteHookFunc
|
||||
}
|
||||
|
||||
func New(httpConfig httpx.Config, center cconf.Center, alert aconf.Alert, operations cconf.Operation, ds *memsto.DatasourceCacheType, ncc *memsto.NotifyConfigCacheType, pc *prom.PromClientMap, tdendgineClients *tdengine.TdengineClientMap, redis storage.Redis, sso *sso.SsoClient, ctx *ctx.Context, metaSet *metas.Set, idents *idents.Set, tc *memsto.TargetCacheType, uc *memsto.UserCacheType, ugc *memsto.UserGroupCacheType) *Router {
|
||||
func New(httpConfig httpx.Config, center cconf.Center, alert aconf.Alert, ibex conf.Ibex,
|
||||
operations cconf.Operation, ds *memsto.DatasourceCacheType, ncc *memsto.NotifyConfigCacheType,
|
||||
pc *prom.PromClientMap, tdendgineClients *tdengine.TdengineClientMap, redis storage.Redis,
|
||||
sso *sso.SsoClient, ctx *ctx.Context, metaSet *metas.Set, idents *idents.Set,
|
||||
tc *memsto.TargetCacheType, uc *memsto.UserCacheType, ugc *memsto.UserGroupCacheType) *Router {
|
||||
return &Router{
|
||||
HTTP: httpConfig,
|
||||
Center: center,
|
||||
Alert: alert,
|
||||
Ibex: ibex,
|
||||
Operations: operations,
|
||||
DatasourceCache: ds,
|
||||
NotifyConfigCache: ncc,
|
||||
@@ -68,9 +78,15 @@ func New(httpConfig httpx.Config, center cconf.Center, alert aconf.Alert, operat
|
||||
UserCache: uc,
|
||||
UserGroupCache: ugc,
|
||||
Ctx: ctx,
|
||||
HeartbeatHook: func(ident string) map[string]interface{} { return nil },
|
||||
TargetDeleteHook: emptyDeleteHook,
|
||||
}
|
||||
}
|
||||
|
||||
func emptyDeleteHook(ctx *ctx.Context, idents []string) error {
|
||||
return nil
|
||||
}
|
||||
|
||||
func stat() gin.HandlerFunc {
|
||||
return func(c *gin.Context) {
|
||||
start := time.Now()
|
||||
@@ -91,7 +107,9 @@ func languageDetector(i18NHeaderKey string) gin.HandlerFunc {
|
||||
if headerKey != "" {
|
||||
lang := c.GetHeader(headerKey)
|
||||
if lang != "" {
|
||||
if strings.HasPrefix(lang, "zh") {
|
||||
if strings.HasPrefix(lang, "zh_HK") {
|
||||
c.Request.Header.Set("X-Language", "zh_HK")
|
||||
} else if strings.HasPrefix(lang, "zh") {
|
||||
c.Request.Header.Set("X-Language", "zh_CN")
|
||||
} else if strings.HasPrefix(lang, "en") {
|
||||
c.Request.Header.Set("X-Language", "en")
|
||||
@@ -112,7 +130,7 @@ func (rt *Router) configNoRoute(r *gin.Engine, fs *http.FileSystem) {
|
||||
suffix := arr[len(arr)-1]
|
||||
|
||||
switch suffix {
|
||||
case "png", "jpeg", "jpg", "svg", "ico", "gif", "css", "js", "html", "htm", "gz", "zip", "map", "ttf":
|
||||
case "png", "jpeg", "jpg", "svg", "ico", "gif", "css", "js", "html", "htm", "gz", "zip", "map", "ttf", "md":
|
||||
if !rt.Center.UseFileAssets {
|
||||
c.FileFromFS(c.Request.URL.Path, *fs)
|
||||
} else {
|
||||
@@ -269,7 +287,7 @@ func (rt *Router) Config(r *gin.Engine) {
|
||||
pages.POST("/targets/tags", rt.auth(), rt.user(), rt.perm("/targets/put"), rt.targetBindTagsByFE)
|
||||
pages.DELETE("/targets/tags", rt.auth(), rt.user(), rt.perm("/targets/put"), rt.targetUnbindTagsByFE)
|
||||
pages.PUT("/targets/note", rt.auth(), rt.user(), rt.perm("/targets/put"), rt.targetUpdateNote)
|
||||
pages.PUT("/targets/bgid", rt.auth(), rt.user(), rt.perm("/targets/put"), rt.targetUpdateBgid)
|
||||
pages.PUT("/targets/bgids", rt.auth(), rt.user(), rt.perm("/targets/put"), rt.targetBindBgids)
|
||||
|
||||
pages.POST("/builtin-cate-favorite", rt.auth(), rt.user(), rt.builtinCateFavoriteAdd)
|
||||
pages.DELETE("/builtin-cate-favorite/:name", rt.auth(), rt.user(), rt.builtinCateFavoriteDel)
|
||||
@@ -290,6 +308,7 @@ func (rt *Router) Config(r *gin.Engine) {
|
||||
pages.POST("/busi-group/:id/board/:bid/clone", rt.auth(), rt.user(), rt.perm("/dashboards/add"), rt.bgrw(), rt.boardClone)
|
||||
pages.POST("/busi-groups/boards/clones", rt.auth(), rt.user(), rt.perm("/dashboards/add"), rt.boardBatchClone)
|
||||
|
||||
pages.GET("/boards", rt.auth(), rt.user(), rt.boardGetsByBids)
|
||||
pages.GET("/board/:bid", rt.boardGet)
|
||||
pages.GET("/board/:bid/pure", rt.boardPureGet)
|
||||
pages.PUT("/board/:bid", rt.auth(), rt.user(), rt.perm("/dashboards/put"), rt.boardPut)
|
||||
@@ -308,11 +327,16 @@ func (rt *Router) Config(r *gin.Engine) {
|
||||
pages.GET("/busi-group/:id/alert-rules", rt.auth(), rt.user(), rt.perm("/alert-rules"), rt.alertRuleGets)
|
||||
pages.POST("/busi-group/:id/alert-rules", rt.auth(), rt.user(), rt.perm("/alert-rules/add"), rt.bgrw(), rt.alertRuleAddByFE)
|
||||
pages.POST("/busi-group/:id/alert-rules/import", rt.auth(), rt.user(), rt.perm("/alert-rules/add"), rt.bgrw(), rt.alertRuleAddByImport)
|
||||
pages.POST("/busi-group/:id/alert-rules/import-prom-rule", rt.auth(),
|
||||
rt.user(), rt.perm("/alert-rules/add"), rt.bgrw(), rt.alertRuleAddByImportPromRule)
|
||||
pages.DELETE("/busi-group/:id/alert-rules", rt.auth(), rt.user(), rt.perm("/alert-rules/del"), rt.bgrw(), rt.alertRuleDel)
|
||||
pages.PUT("/busi-group/:id/alert-rules/fields", rt.auth(), rt.user(), rt.perm("/alert-rules/put"), rt.bgrw(), rt.alertRulePutFields)
|
||||
pages.PUT("/busi-group/:id/alert-rule/:arid", rt.auth(), rt.user(), rt.perm("/alert-rules/put"), rt.alertRulePutByFE)
|
||||
pages.GET("/alert-rule/:arid", rt.auth(), rt.user(), rt.perm("/alert-rules"), rt.alertRuleGet)
|
||||
pages.GET("/alert-rule/:arid/pure", rt.auth(), rt.user(), rt.perm("/alert-rules"), rt.alertRulePureGet)
|
||||
pages.PUT("/busi-group/alert-rule/validate", rt.auth(), rt.user(), rt.perm("/alert-rules/put"), rt.alertRuleValidation)
|
||||
pages.POST("/relabel-test", rt.auth(), rt.user(), rt.relabelTest)
|
||||
pages.POST("/busi-group/:id/alert-rules/clone", rt.auth(), rt.user(), rt.perm("/alert-rules/add"), rt.bgrw(), rt.cloneToMachine)
|
||||
|
||||
pages.GET("/busi-groups/recording-rules", rt.auth(), rt.user(), rt.perm("/recording-rules"), rt.recordingRuleGetsByGids)
|
||||
pages.GET("/busi-group/:id/recording-rules", rt.auth(), rt.user(), rt.perm("/recording-rules"), rt.recordingRuleGets)
|
||||
@@ -341,9 +365,11 @@ func (rt *Router) Config(r *gin.Engine) {
|
||||
if rt.Center.AnonymousAccess.AlertDetail {
|
||||
pages.GET("/alert-cur-event/:eid", rt.alertCurEventGet)
|
||||
pages.GET("/alert-his-event/:eid", rt.alertHisEventGet)
|
||||
pages.GET("/event-notify-records/:eid", rt.notificationRecordList)
|
||||
} else {
|
||||
pages.GET("/alert-cur-event/:eid", rt.auth(), rt.user(), rt.alertCurEventGet)
|
||||
pages.GET("/alert-his-event/:eid", rt.auth(), rt.user(), rt.alertHisEventGet)
|
||||
pages.GET("/event-notify-records/:eid", rt.auth(), rt.user(), rt.notificationRecordList)
|
||||
}
|
||||
|
||||
// card logic
|
||||
@@ -372,8 +398,8 @@ func (rt *Router) Config(r *gin.Engine) {
|
||||
pages.GET("/busi-group/:id/tasks", rt.auth(), rt.user(), rt.perm("/job-tasks"), rt.bgro(), rt.taskGets)
|
||||
pages.POST("/busi-group/:id/tasks", rt.auth(), rt.user(), rt.perm("/job-tasks/add"), rt.bgrw(), rt.taskAdd)
|
||||
|
||||
pages.GET("/servers", rt.auth(), rt.user(), rt.perm("/help/servers"), rt.serversGet)
|
||||
pages.GET("/server-clusters", rt.auth(), rt.user(), rt.perm("/help/servers"), rt.serverClustersGet)
|
||||
pages.GET("/servers", rt.auth(), rt.user(), rt.serversGet)
|
||||
pages.GET("/server-clusters", rt.auth(), rt.user(), rt.serverClustersGet)
|
||||
|
||||
pages.POST("/datasource/list", rt.auth(), rt.user(), rt.datasourceList)
|
||||
pages.POST("/datasource/plugin/list", rt.auth(), rt.pluginList)
|
||||
@@ -423,6 +449,9 @@ func (rt *Router) Config(r *gin.Engine) {
|
||||
pages.PUT("/es-index-pattern", rt.auth(), rt.admin(), rt.esIndexPatternPut)
|
||||
pages.DELETE("/es-index-pattern", rt.auth(), rt.admin(), rt.esIndexPatternDel)
|
||||
|
||||
pages.GET("/embedded-dashboards", rt.auth(), rt.user(), rt.perm("/embedded-dashboards"), rt.embeddedDashboardsGet)
|
||||
pages.PUT("/embedded-dashboards", rt.auth(), rt.user(), rt.perm("/embedded-dashboards/put"), rt.embeddedDashboardsPut)
|
||||
|
||||
pages.GET("/user-variable-configs", rt.auth(), rt.user(), rt.perm("/help/variable-configs"), rt.userVariableConfigGets)
|
||||
pages.POST("/user-variable-config", rt.auth(), rt.user(), rt.perm("/help/variable-configs"), rt.userVariableConfigAdd)
|
||||
pages.PUT("/user-variable-config/:id", rt.auth(), rt.user(), rt.perm("/help/variable-configs"), rt.userVariableConfigPut)
|
||||
@@ -446,6 +475,7 @@ func (rt *Router) Config(r *gin.Engine) {
|
||||
pages.GET("/builtin-payload/:id", rt.auth(), rt.user(), rt.perm("/built-in-components"), rt.builtinPayloadGet)
|
||||
pages.PUT("/builtin-payloads", rt.auth(), rt.user(), rt.perm("/built-in-components/put"), rt.builtinPayloadsPut)
|
||||
pages.DELETE("/builtin-payloads", rt.auth(), rt.user(), rt.perm("/built-in-components/del"), rt.builtinPayloadsDel)
|
||||
pages.GET("/builtin-payload", rt.auth(), rt.user(), rt.builtinPayloadsGetByUUIDOrID)
|
||||
}
|
||||
|
||||
r.GET("/api/n9e/versions", func(c *gin.Context) {
|
||||
@@ -522,6 +552,7 @@ func (rt *Router) Config(r *gin.Engine) {
|
||||
service.GET("/config/:id", rt.configGet)
|
||||
service.GET("/configs", rt.configsGet)
|
||||
service.GET("/config", rt.configGetByKey)
|
||||
service.GET("/all-configs", rt.configGetAll)
|
||||
service.PUT("/configs", rt.configsPut)
|
||||
service.POST("/configs", rt.configsPost)
|
||||
service.DELETE("/configs", rt.configsDel)
|
||||
@@ -539,6 +570,11 @@ func (rt *Router) Config(r *gin.Engine) {
|
||||
|
||||
service.GET("/targets-of-alert-rule", rt.targetsOfAlertRule)
|
||||
|
||||
service.POST("/notify-record", rt.notificationRecordAdd)
|
||||
|
||||
service.GET("/alert-cur-events-del-by-hash", rt.alertCurEventDelByHash)
|
||||
|
||||
service.POST("/center/heartbeat", rt.heartbeat)
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -65,7 +65,8 @@ func (rt *Router) alertCurEventsCard(c *gin.Context) {
|
||||
ginx.Dangerous(err)
|
||||
|
||||
// 最多获取50000个,获取太多也没啥意义
|
||||
list, err := models.AlertCurEventGets(rt.Ctx, prods, bgids, stime, etime, severity, dsIds, cates, query, 50000, 0)
|
||||
list, err := models.AlertCurEventsGet(rt.Ctx, prods, bgids, stime, etime, severity, dsIds,
|
||||
cates, 0, query, 50000, 0)
|
||||
ginx.Dangerous(err)
|
||||
|
||||
cardmap := make(map[string]*AlertCard)
|
||||
@@ -162,13 +163,17 @@ func (rt *Router) alertCurEventsList(c *gin.Context) {
|
||||
cates = strings.Split(cate, ",")
|
||||
}
|
||||
|
||||
ruleId := ginx.QueryInt64(c, "rid", 0)
|
||||
|
||||
bgids, err := GetBusinessGroupIds(c, rt.Ctx, rt.Center.EventHistoryGroupView)
|
||||
ginx.Dangerous(err)
|
||||
|
||||
total, err := models.AlertCurEventTotal(rt.Ctx, prods, bgids, stime, etime, severity, dsIds, cates, query)
|
||||
total, err := models.AlertCurEventTotal(rt.Ctx, prods, bgids, stime, etime, severity, dsIds,
|
||||
cates, ruleId, query)
|
||||
ginx.Dangerous(err)
|
||||
|
||||
list, err := models.AlertCurEventGets(rt.Ctx, prods, bgids, stime, etime, severity, dsIds, cates, query, limit, ginx.Offset(c, limit))
|
||||
list, err := models.AlertCurEventsGet(rt.Ctx, prods, bgids, stime, etime, severity, dsIds,
|
||||
cates, ruleId, query, limit, ginx.Offset(c, limit))
|
||||
ginx.Dangerous(err)
|
||||
|
||||
cache := make(map[int64]*models.UserGroup)
|
||||
@@ -201,7 +206,9 @@ func (rt *Router) checkCurEventBusiGroupRWPermission(c *gin.Context, ids []int64
|
||||
for i := 0; i < len(ids); i++ {
|
||||
event, err := models.AlertCurEventGetById(rt.Ctx, ids[i])
|
||||
ginx.Dangerous(err)
|
||||
|
||||
if event == nil {
|
||||
continue
|
||||
}
|
||||
if _, has := set[event.GroupId]; !has {
|
||||
rt.bgrwCheck(c, event.GroupId)
|
||||
set[event.GroupId] = struct{}{}
|
||||
@@ -222,6 +229,12 @@ func (rt *Router) alertCurEventGet(c *gin.Context) {
|
||||
rt.bgroCheck(c, event.GroupId)
|
||||
}
|
||||
|
||||
ruleConfig, needReset := models.FillRuleConfigTplName(rt.Ctx, event.RuleConfig)
|
||||
if needReset {
|
||||
event.RuleConfigJson = ruleConfig
|
||||
}
|
||||
|
||||
event.LastEvalTime = event.TriggerTime
|
||||
ginx.NewRender(c).Data(event, nil)
|
||||
}
|
||||
|
||||
@@ -229,3 +242,8 @@ func (rt *Router) alertCurEventsStatistics(c *gin.Context) {
|
||||
|
||||
ginx.NewRender(c).Data(models.AlertCurEventStatistics(rt.Ctx, time.Now()), nil)
|
||||
}
|
||||
|
||||
func (rt *Router) alertCurEventDelByHash(c *gin.Context) {
|
||||
hash := ginx.QueryStr(c, "hash")
|
||||
ginx.NewRender(c).Message(models.AlertCurEventDelByHash(rt.Ctx, hash))
|
||||
}
|
||||
|
||||
@@ -54,13 +54,17 @@ func (rt *Router) alertHisEventsList(c *gin.Context) {
|
||||
cates = strings.Split(cate, ",")
|
||||
}
|
||||
|
||||
ruleId := ginx.QueryInt64(c, "rid", 0)
|
||||
|
||||
bgids, err := GetBusinessGroupIds(c, rt.Ctx, rt.Center.EventHistoryGroupView)
|
||||
ginx.Dangerous(err)
|
||||
|
||||
total, err := models.AlertHisEventTotal(rt.Ctx, prods, bgids, stime, etime, severity, recovered, dsIds, cates, query)
|
||||
total, err := models.AlertHisEventTotal(rt.Ctx, prods, bgids, stime, etime, severity,
|
||||
recovered, dsIds, cates, ruleId, query)
|
||||
ginx.Dangerous(err)
|
||||
|
||||
list, err := models.AlertHisEventGets(rt.Ctx, prods, bgids, stime, etime, severity, recovered, dsIds, cates, query, limit, ginx.Offset(c, limit))
|
||||
list, err := models.AlertHisEventGets(rt.Ctx, prods, bgids, stime, etime, severity, recovered,
|
||||
dsIds, cates, ruleId, query, limit, ginx.Offset(c, limit))
|
||||
ginx.Dangerous(err)
|
||||
|
||||
cache := make(map[int64]*models.UserGroup)
|
||||
@@ -87,6 +91,11 @@ func (rt *Router) alertHisEventGet(c *gin.Context) {
|
||||
rt.bgroCheck(c, event.GroupId)
|
||||
}
|
||||
|
||||
ruleConfig, needReset := models.FillRuleConfigTplName(rt.Ctx, event.RuleConfig)
|
||||
if needReset {
|
||||
event.RuleConfigJson = ruleConfig
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Data(event, err)
|
||||
}
|
||||
|
||||
|
||||
@@ -1,14 +1,23 @@
|
||||
package router
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"net/http"
|
||||
"regexp"
|
||||
"strconv"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"gopkg.in/yaml.v2"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pushgw/pconf"
|
||||
"github.com/ccfos/nightingale/v6/pushgw/writer"
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
"github.com/jinzhu/copier"
|
||||
"github.com/prometheus/prometheus/prompb"
|
||||
"github.com/toolkits/pkg/ginx"
|
||||
"github.com/toolkits/pkg/i18n"
|
||||
"github.com/toolkits/pkg/str"
|
||||
@@ -28,6 +37,18 @@ func (rt *Router) alertRuleGets(c *gin.Context) {
|
||||
ginx.NewRender(c).Data(ars, err)
|
||||
}
|
||||
|
||||
func getAlertCueEventTimeRange(c *gin.Context) (stime, etime int64) {
|
||||
stime = ginx.QueryInt64(c, "stime", 0)
|
||||
etime = ginx.QueryInt64(c, "etime", 0)
|
||||
if etime == 0 {
|
||||
etime = time.Now().Unix()
|
||||
}
|
||||
if stime == 0 || stime >= etime {
|
||||
stime = etime - 30*24*int64(time.Hour.Seconds())
|
||||
}
|
||||
return
|
||||
}
|
||||
|
||||
func (rt *Router) alertRuleGetsByGids(c *gin.Context) {
|
||||
gids := str.IdsInt64(ginx.QueryStr(c, "gids", ""), ",")
|
||||
if len(gids) > 0 {
|
||||
@@ -51,9 +72,30 @@ func (rt *Router) alertRuleGetsByGids(c *gin.Context) {
|
||||
ars, err := models.AlertRuleGetsByBGIds(rt.Ctx, gids)
|
||||
if err == nil {
|
||||
cache := make(map[int64]*models.UserGroup)
|
||||
rids := make([]int64, 0, len(ars))
|
||||
names := make([]string, 0, len(ars))
|
||||
for i := 0; i < len(ars); i++ {
|
||||
ars[i].FillNotifyGroups(rt.Ctx, cache)
|
||||
ars[i].FillSeverities()
|
||||
rids = append(rids, ars[i].Id)
|
||||
names = append(names, ars[i].UpdateBy)
|
||||
}
|
||||
|
||||
stime, etime := getAlertCueEventTimeRange(c)
|
||||
cnt := models.AlertCurEventCountByRuleId(rt.Ctx, rids, stime, etime)
|
||||
if cnt != nil {
|
||||
for i := 0; i < len(ars); i++ {
|
||||
ars[i].CurEventCount = cnt[ars[i].Id]
|
||||
}
|
||||
}
|
||||
|
||||
users := models.UserMapGet(rt.Ctx, "username in (?)", names)
|
||||
if users != nil {
|
||||
for i := 0; i < len(ars); i++ {
|
||||
if user, exist := users[ars[i].UpdateBy]; exist {
|
||||
ars[i].UpdateByNickname = user.Nickname
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
ginx.NewRender(c).Data(ars, err)
|
||||
@@ -121,6 +163,34 @@ func (rt *Router) alertRuleAddByImport(c *gin.Context) {
|
||||
ginx.NewRender(c).Data(reterr, nil)
|
||||
}
|
||||
|
||||
type promRuleForm struct {
|
||||
Payload string `json:"payload" binding:"required"`
|
||||
DatasourceIds []int64 `json:"datasource_ids" binding:"required"`
|
||||
Disabled int `json:"disabled" binding:"gte=0,lte=1"`
|
||||
}
|
||||
|
||||
func (rt *Router) alertRuleAddByImportPromRule(c *gin.Context) {
|
||||
var f promRuleForm
|
||||
ginx.Dangerous(c.BindJSON(&f))
|
||||
|
||||
var pr struct {
|
||||
Groups []models.PromRuleGroup `yaml:"groups"`
|
||||
}
|
||||
err := yaml.Unmarshal([]byte(f.Payload), &pr)
|
||||
if err != nil {
|
||||
ginx.Bomb(http.StatusBadRequest, "invalid yaml format, please use the example format. err: %v", err)
|
||||
}
|
||||
|
||||
if len(pr.Groups) == 0 {
|
||||
ginx.Bomb(http.StatusBadRequest, "input yaml is empty")
|
||||
}
|
||||
|
||||
lst := models.DealPromGroup(pr.Groups, f.DatasourceIds, f.Disabled)
|
||||
username := c.MustGet("username").(string)
|
||||
bgid := ginx.UrlParamInt64(c, "id")
|
||||
ginx.NewRender(c).Data(rt.alertRuleAdd(lst, username, bgid, c.GetHeader("X-Language")), nil)
|
||||
}
|
||||
|
||||
func (rt *Router) alertRuleAddByService(c *gin.Context) {
|
||||
var lst []models.AlertRule
|
||||
ginx.BindJSON(c, &lst)
|
||||
@@ -271,6 +341,43 @@ func (rt *Router) alertRulePutFields(c *gin.Context) {
|
||||
continue
|
||||
}
|
||||
|
||||
if f.Action == "update_triggers" {
|
||||
if triggers, has := f.Fields["triggers"]; has {
|
||||
originRule := ar.RuleConfigJson.(map[string]interface{})
|
||||
originRule["triggers"] = triggers
|
||||
b, err := json.Marshal(originRule)
|
||||
ginx.Dangerous(err)
|
||||
ginx.Dangerous(ar.UpdateFieldsMap(rt.Ctx, map[string]interface{}{"rule_config": string(b)}))
|
||||
continue
|
||||
}
|
||||
}
|
||||
|
||||
if f.Action == "annotations_add" {
|
||||
if annotations, has := f.Fields["annotations"]; has {
|
||||
annotationsMap := annotations.(map[string]interface{})
|
||||
for k, v := range annotationsMap {
|
||||
ar.AnnotationsJSON[k] = v.(string)
|
||||
}
|
||||
b, err := json.Marshal(ar.AnnotationsJSON)
|
||||
ginx.Dangerous(err)
|
||||
ginx.Dangerous(ar.UpdateFieldsMap(rt.Ctx, map[string]interface{}{"annotations": string(b)}))
|
||||
continue
|
||||
}
|
||||
}
|
||||
|
||||
if f.Action == "annotations_del" {
|
||||
if annotations, has := f.Fields["annotations"]; has {
|
||||
annotationsKeys := annotations.(map[string]interface{})
|
||||
for key := range annotationsKeys {
|
||||
delete(ar.AnnotationsJSON, key)
|
||||
}
|
||||
b, err := json.Marshal(ar.AnnotationsJSON)
|
||||
ginx.Dangerous(err)
|
||||
ginx.Dangerous(ar.UpdateFieldsMap(rt.Ctx, map[string]interface{}{"annotations": string(b)}))
|
||||
continue
|
||||
}
|
||||
}
|
||||
|
||||
if f.Action == "callback_add" {
|
||||
// 增加一个 callback 地址
|
||||
if callbacks, has := f.Fields["callbacks"]; has {
|
||||
@@ -316,6 +423,20 @@ func (rt *Router) alertRuleGet(c *gin.Context) {
|
||||
ginx.NewRender(c).Data(ar, err)
|
||||
}
|
||||
|
||||
func (rt *Router) alertRulePureGet(c *gin.Context) {
|
||||
arid := ginx.UrlParamInt64(c, "arid")
|
||||
|
||||
ar, err := models.AlertRuleGetById(rt.Ctx, arid)
|
||||
ginx.Dangerous(err)
|
||||
|
||||
if ar == nil {
|
||||
ginx.NewRender(c, http.StatusNotFound).Message("No such AlertRule")
|
||||
return
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Data(ar, err)
|
||||
}
|
||||
|
||||
// pre validation before save rule
|
||||
func (rt *Router) alertRuleValidation(c *gin.Context) {
|
||||
var f models.AlertRule //new
|
||||
@@ -388,3 +509,138 @@ func (rt *Router) alertRuleCallbacks(c *gin.Context) {
|
||||
|
||||
ginx.NewRender(c).Data(callbacks, nil)
|
||||
}
|
||||
|
||||
type alertRuleTestForm struct {
|
||||
Configs []*pconf.RelabelConfig `json:"configs"`
|
||||
Tags []string `json:"tags"`
|
||||
}
|
||||
|
||||
func (rt *Router) relabelTest(c *gin.Context) {
|
||||
var f alertRuleTestForm
|
||||
ginx.BindJSON(c, &f)
|
||||
|
||||
if len(f.Tags) == 0 || len(f.Configs) == 0 {
|
||||
ginx.Bomb(http.StatusBadRequest, "relabel config is empty")
|
||||
}
|
||||
|
||||
labels := make([]prompb.Label, len(f.Tags))
|
||||
for i, tag := range f.Tags {
|
||||
label := strings.SplitN(tag, "=", 2)
|
||||
if len(label) != 2 {
|
||||
ginx.Bomb(http.StatusBadRequest, "tag:%s format error", tag)
|
||||
}
|
||||
|
||||
labels[i] = prompb.Label{Name: label[0], Value: label[1]}
|
||||
}
|
||||
|
||||
for i := 0; i < len(f.Configs); i++ {
|
||||
if f.Configs[i].Replacement == "" {
|
||||
f.Configs[i].Replacement = "$1"
|
||||
}
|
||||
|
||||
if f.Configs[i].Separator == "" {
|
||||
f.Configs[i].Separator = ";"
|
||||
}
|
||||
|
||||
if f.Configs[i].Regex == "" {
|
||||
f.Configs[i].Regex = "(.*)"
|
||||
}
|
||||
}
|
||||
|
||||
relabels := writer.Process(labels, f.Configs...)
|
||||
|
||||
var tags []string
|
||||
for _, label := range relabels {
|
||||
tags = append(tags, fmt.Sprintf("%s=%s", label.Name, label.Value))
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Data(tags, nil)
|
||||
}
|
||||
|
||||
type identListForm struct {
|
||||
Ids []int64 `json:"ids"`
|
||||
IdentList []string `json:"ident_list"`
|
||||
}
|
||||
|
||||
func containsIdentOperator(s string) bool {
|
||||
pattern := `ident\s*(!=|!~|=~)`
|
||||
matched, err := regexp.MatchString(pattern, s)
|
||||
if err != nil {
|
||||
return false
|
||||
}
|
||||
return matched
|
||||
}
|
||||
|
||||
func (rt *Router) cloneToMachine(c *gin.Context) {
|
||||
var f identListForm
|
||||
ginx.BindJSON(c, &f)
|
||||
|
||||
if len(f.IdentList) == 0 {
|
||||
ginx.Bomb(http.StatusBadRequest, "ident_list is empty")
|
||||
}
|
||||
|
||||
alertRules, err := models.AlertRuleGetsByIds(rt.Ctx, f.Ids)
|
||||
ginx.Dangerous(err)
|
||||
|
||||
re := regexp.MustCompile(`ident\s*=\s*\\".*?\\"`)
|
||||
|
||||
user := c.MustGet("username").(string)
|
||||
now := time.Now().Unix()
|
||||
|
||||
newRules := make([]*models.AlertRule, 0)
|
||||
|
||||
reterr := make(map[string]map[string]string)
|
||||
|
||||
for i := range alertRules {
|
||||
errMsg := make(map[string]string)
|
||||
|
||||
if alertRules[i].Cate != "prometheus" {
|
||||
errMsg["all"] = "Only Prometheus rule can be cloned to machines"
|
||||
reterr[alertRules[i].Name] = errMsg
|
||||
continue
|
||||
}
|
||||
|
||||
if containsIdentOperator(alertRules[i].RuleConfig) {
|
||||
errMsg["all"] = "promql is missing ident"
|
||||
reterr[alertRules[i].Name] = errMsg
|
||||
continue
|
||||
}
|
||||
|
||||
for j := range f.IdentList {
|
||||
alertRules[i].RuleConfig = re.ReplaceAllString(alertRules[i].RuleConfig, fmt.Sprintf(`ident=\"%s\"`, f.IdentList[j]))
|
||||
|
||||
newRule := &models.AlertRule{}
|
||||
if err := copier.Copy(newRule, alertRules[i]); err != nil {
|
||||
errMsg[f.IdentList[j]] = fmt.Sprintf("fail to clone rule, err: %s", err)
|
||||
continue
|
||||
}
|
||||
|
||||
newRule.Id = 0
|
||||
newRule.Name = alertRules[i].Name + "_" + f.IdentList[j]
|
||||
newRule.CreateBy = user
|
||||
newRule.UpdateBy = user
|
||||
newRule.UpdateAt = now
|
||||
newRule.CreateAt = now
|
||||
newRule.RuleConfig = alertRules[i].RuleConfig
|
||||
|
||||
exist, err := models.AlertRuleExists(rt.Ctx, 0, newRule.GroupId, newRule.DatasourceIdsJson, newRule.Name)
|
||||
if err != nil {
|
||||
errMsg[f.IdentList[j]] = err.Error()
|
||||
continue
|
||||
}
|
||||
|
||||
if exist {
|
||||
errMsg[f.IdentList[j]] = fmt.Sprintf("rule already exists, ruleName: %s", newRule.Name)
|
||||
continue
|
||||
}
|
||||
|
||||
newRules = append(newRules, newRule)
|
||||
}
|
||||
|
||||
if len(errMsg) > 0 {
|
||||
reterr[alertRules[i].Name] = errMsg
|
||||
}
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Data(reterr, models.InsertAlertRule(rt.Ctx, newRules))
|
||||
}
|
||||
|
||||
@@ -94,6 +94,14 @@ func (rt *Router) boardGet(c *gin.Context) {
|
||||
ginx.NewRender(c).Data(board, nil)
|
||||
}
|
||||
|
||||
// 根据 bids 参数,获取多个 board
|
||||
func (rt *Router) boardGetsByBids(c *gin.Context) {
|
||||
bids := str.IdsInt64(ginx.QueryStr(c, "bids", ""), ",")
|
||||
boards, err := models.BoardGetsByBids(rt.Ctx, bids)
|
||||
ginx.Dangerous(err)
|
||||
ginx.NewRender(c).Data(boards, err)
|
||||
}
|
||||
|
||||
func (rt *Router) boardPureGet(c *gin.Context) {
|
||||
board, err := models.BoardGetByID(rt.Ctx, ginx.UrlParamInt64(c, "bid"))
|
||||
ginx.Dangerous(err)
|
||||
|
||||
@@ -4,10 +4,15 @@ import (
|
||||
"net/http"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
"github.com/toolkits/pkg/ginx"
|
||||
"gorm.io/gorm"
|
||||
)
|
||||
|
||||
const SYSTEM = "system"
|
||||
|
||||
func (rt *Router) builtinComponentsAdd(c *gin.Context) {
|
||||
var lst []models.BuiltinComponent
|
||||
ginx.BindJSON(c, &lst)
|
||||
@@ -50,10 +55,31 @@ func (rt *Router) builtinComponentsPut(c *gin.Context) {
|
||||
return
|
||||
}
|
||||
|
||||
if bc.CreatedBy == SYSTEM {
|
||||
req.Ident = bc.Ident
|
||||
}
|
||||
|
||||
username := Username(c)
|
||||
req.UpdatedBy = username
|
||||
|
||||
ginx.NewRender(c).Message(bc.Update(rt.Ctx, req))
|
||||
err = models.DB(rt.Ctx).Transaction(func(tx *gorm.DB) error {
|
||||
tCtx := &ctx.Context{
|
||||
DB: tx,
|
||||
}
|
||||
|
||||
txErr := models.BuiltinMetricBatchUpdateColumn(tCtx, "typ", bc.Ident, req.Ident, req.UpdatedBy)
|
||||
if txErr != nil {
|
||||
return txErr
|
||||
}
|
||||
|
||||
txErr = bc.Update(tCtx, req)
|
||||
if txErr != nil {
|
||||
return txErr
|
||||
}
|
||||
return nil
|
||||
})
|
||||
|
||||
ginx.NewRender(c).Message(err)
|
||||
}
|
||||
|
||||
func (rt *Router) builtinComponentsDel(c *gin.Context) {
|
||||
|
||||
@@ -6,6 +6,7 @@ import (
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"github.com/BurntSushi/toml"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/gin-gonic/gin"
|
||||
"github.com/toolkits/pkg/ginx"
|
||||
@@ -52,15 +53,15 @@ func (rt *Router) builtinPayloadsAdd(c *gin.Context) {
|
||||
}
|
||||
|
||||
bp := models.BuiltinPayload{
|
||||
Type: lst[i].Type,
|
||||
Component: lst[i].Component,
|
||||
Cate: lst[i].Cate,
|
||||
Name: rule.Name,
|
||||
Tags: rule.AppendTags,
|
||||
UUID: rule.UUID,
|
||||
Content: string(contentBytes),
|
||||
CreatedBy: username,
|
||||
UpdatedBy: username,
|
||||
Type: lst[i].Type,
|
||||
ComponentID: lst[i].ComponentID,
|
||||
Cate: lst[i].Cate,
|
||||
Name: rule.Name,
|
||||
Tags: rule.AppendTags,
|
||||
UUID: rule.UUID,
|
||||
Content: string(contentBytes),
|
||||
CreatedBy: username,
|
||||
UpdatedBy: username,
|
||||
}
|
||||
|
||||
if err := bp.Add(rt.Ctx, username); err != nil {
|
||||
@@ -81,15 +82,15 @@ func (rt *Router) builtinPayloadsAdd(c *gin.Context) {
|
||||
}
|
||||
|
||||
bp := models.BuiltinPayload{
|
||||
Type: lst[i].Type,
|
||||
Component: lst[i].Component,
|
||||
Cate: lst[i].Cate,
|
||||
Name: alertRule.Name,
|
||||
Tags: alertRule.AppendTags,
|
||||
UUID: alertRule.UUID,
|
||||
Content: lst[i].Content,
|
||||
CreatedBy: username,
|
||||
UpdatedBy: username,
|
||||
Type: lst[i].Type,
|
||||
ComponentID: lst[i].ComponentID,
|
||||
Cate: lst[i].Cate,
|
||||
Name: alertRule.Name,
|
||||
Tags: alertRule.AppendTags,
|
||||
UUID: alertRule.UUID,
|
||||
Content: lst[i].Content,
|
||||
CreatedBy: username,
|
||||
UpdatedBy: username,
|
||||
}
|
||||
|
||||
if err := bp.Add(rt.Ctx, username); err != nil {
|
||||
@@ -115,15 +116,15 @@ func (rt *Router) builtinPayloadsAdd(c *gin.Context) {
|
||||
}
|
||||
|
||||
bp := models.BuiltinPayload{
|
||||
Type: lst[i].Type,
|
||||
Component: lst[i].Component,
|
||||
Cate: lst[i].Cate,
|
||||
Name: dashboard.Name,
|
||||
Tags: dashboard.Tags,
|
||||
UUID: dashboard.UUID,
|
||||
Content: string(contentBytes),
|
||||
CreatedBy: username,
|
||||
UpdatedBy: username,
|
||||
Type: lst[i].Type,
|
||||
ComponentID: lst[i].ComponentID,
|
||||
Cate: lst[i].Cate,
|
||||
Name: dashboard.Name,
|
||||
Tags: dashboard.Tags,
|
||||
UUID: dashboard.UUID,
|
||||
Content: string(contentBytes),
|
||||
CreatedBy: username,
|
||||
UpdatedBy: username,
|
||||
}
|
||||
|
||||
if err := bp.Add(rt.Ctx, username); err != nil {
|
||||
@@ -144,21 +145,29 @@ func (rt *Router) builtinPayloadsAdd(c *gin.Context) {
|
||||
}
|
||||
|
||||
bp := models.BuiltinPayload{
|
||||
Type: lst[i].Type,
|
||||
Component: lst[i].Component,
|
||||
Cate: lst[i].Cate,
|
||||
Name: dashboard.Name,
|
||||
Tags: dashboard.Tags,
|
||||
UUID: dashboard.UUID,
|
||||
Content: lst[i].Content,
|
||||
CreatedBy: username,
|
||||
UpdatedBy: username,
|
||||
Type: lst[i].Type,
|
||||
ComponentID: lst[i].ComponentID,
|
||||
Cate: lst[i].Cate,
|
||||
Name: dashboard.Name,
|
||||
Tags: dashboard.Tags,
|
||||
UUID: dashboard.UUID,
|
||||
Content: lst[i].Content,
|
||||
CreatedBy: username,
|
||||
UpdatedBy: username,
|
||||
}
|
||||
|
||||
if err := bp.Add(rt.Ctx, username); err != nil {
|
||||
reterr[bp.Name] = i18n.Sprintf(c.GetHeader("X-Language"), err.Error())
|
||||
}
|
||||
} else {
|
||||
if lst[i].Type == "collect" {
|
||||
c := make(map[string]interface{})
|
||||
if _, err := toml.Decode(lst[i].Content, &c); err != nil {
|
||||
reterr[lst[i].Name] = err.Error()
|
||||
continue
|
||||
}
|
||||
}
|
||||
|
||||
if err := lst[i].Add(rt.Ctx, username); err != nil {
|
||||
reterr[lst[i].Name] = i18n.Sprintf(c.GetHeader("X-Language"), err.Error())
|
||||
}
|
||||
@@ -171,19 +180,20 @@ func (rt *Router) builtinPayloadsAdd(c *gin.Context) {
|
||||
|
||||
func (rt *Router) builtinPayloadsGets(c *gin.Context) {
|
||||
typ := ginx.QueryStr(c, "type", "")
|
||||
component := ginx.QueryStr(c, "component", "")
|
||||
ComponentID := ginx.QueryInt64(c, "component_id", 0)
|
||||
|
||||
cate := ginx.QueryStr(c, "cate", "")
|
||||
query := ginx.QueryStr(c, "query", "")
|
||||
|
||||
lst, err := models.BuiltinPayloadGets(rt.Ctx, typ, component, cate, query)
|
||||
lst, err := models.BuiltinPayloadGets(rt.Ctx, uint64(ComponentID), typ, cate, query)
|
||||
ginx.NewRender(c).Data(lst, err)
|
||||
}
|
||||
|
||||
func (rt *Router) builtinPayloadcatesGet(c *gin.Context) {
|
||||
typ := ginx.QueryStr(c, "type", "")
|
||||
component := ginx.QueryStr(c, "component", "")
|
||||
ComponentID := ginx.QueryInt64(c, "component_id", 0)
|
||||
|
||||
cates, err := models.BuiltinPayloadCates(rt.Ctx, typ, component)
|
||||
cates, err := models.BuiltinPayloadCates(rt.Ctx, typ, uint64(ComponentID))
|
||||
ginx.NewRender(c).Data(cates, err)
|
||||
}
|
||||
|
||||
@@ -229,6 +239,11 @@ func (rt *Router) builtinPayloadsPut(c *gin.Context) {
|
||||
|
||||
req.Name = dashboard.Name
|
||||
req.Tags = dashboard.Tags
|
||||
} else if req.Type == "collect" {
|
||||
c := make(map[string]interface{})
|
||||
if _, err := toml.Decode(req.Content, &c); err != nil {
|
||||
ginx.Bomb(http.StatusBadRequest, err.Error())
|
||||
}
|
||||
}
|
||||
|
||||
username := Username(c)
|
||||
@@ -245,3 +260,15 @@ func (rt *Router) builtinPayloadsDel(c *gin.Context) {
|
||||
|
||||
ginx.NewRender(c).Message(models.BuiltinPayloadDels(rt.Ctx, req.Ids))
|
||||
}
|
||||
|
||||
func (rt *Router) builtinPayloadsGetByUUIDOrID(c *gin.Context) {
|
||||
uuid := ginx.QueryInt64(c, "uuid", 0)
|
||||
// 优先以 uuid 为准
|
||||
if uuid != 0 {
|
||||
ginx.NewRender(c).Data(models.BuiltinPayloadGet(rt.Ctx, "uuid = ?", uuid))
|
||||
return
|
||||
}
|
||||
|
||||
id := ginx.QueryInt64(c, "id", 0)
|
||||
ginx.NewRender(c).Data(models.BuiltinPayloadGet(rt.Ctx, "id = ?", id))
|
||||
}
|
||||
|
||||
@@ -1,13 +1,16 @@
|
||||
package router
|
||||
|
||||
import (
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"time"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
"github.com/toolkits/pkg/ginx"
|
||||
)
|
||||
|
||||
const EMBEDDEDDASHBOARD = "embedded-dashboards"
|
||||
|
||||
func (rt *Router) configsGet(c *gin.Context) {
|
||||
prefix := ginx.QueryStr(c, "prefix", "")
|
||||
limit := ginx.QueryInt(c, "limit", 10)
|
||||
@@ -21,6 +24,11 @@ func (rt *Router) configGet(c *gin.Context) {
|
||||
ginx.NewRender(c).Data(configs, err)
|
||||
}
|
||||
|
||||
func (rt *Router) configGetAll(c *gin.Context) {
|
||||
config, err := models.ConfigsGetAll(rt.Ctx)
|
||||
ginx.NewRender(c).Data(config, err)
|
||||
}
|
||||
|
||||
func (rt *Router) configGetByKey(c *gin.Context) {
|
||||
config, err := models.ConfigsGet(rt.Ctx, ginx.QueryStr(c, "key"))
|
||||
ginx.NewRender(c).Data(config, err)
|
||||
@@ -33,6 +41,18 @@ func (rt *Router) configPutByKey(c *gin.Context) {
|
||||
ginx.NewRender(c).Message(models.ConfigsSetWithUname(rt.Ctx, f.Ckey, f.Cval, username))
|
||||
}
|
||||
|
||||
func (rt *Router) embeddedDashboardsGet(c *gin.Context) {
|
||||
config, err := models.ConfigsGet(rt.Ctx, EMBEDDEDDASHBOARD)
|
||||
ginx.NewRender(c).Data(config, err)
|
||||
}
|
||||
|
||||
func (rt *Router) embeddedDashboardsPut(c *gin.Context) {
|
||||
var f models.Configs
|
||||
ginx.BindJSON(c, &f)
|
||||
username := c.MustGet("username").(string)
|
||||
ginx.NewRender(c).Message(models.ConfigsSetWithUname(rt.Ctx, EMBEDDEDDASHBOARD, f.Cval, username))
|
||||
}
|
||||
|
||||
func (rt *Router) configsDel(c *gin.Context) {
|
||||
var f idsForm
|
||||
ginx.BindJSON(c, &f)
|
||||
|
||||
@@ -92,10 +92,12 @@ func (rt *Router) datasourceUpsert(c *gin.Context) {
|
||||
var err error
|
||||
var count int64
|
||||
|
||||
err = DatasourceCheck(req)
|
||||
if err != nil {
|
||||
Dangerous(c, err)
|
||||
return
|
||||
if !req.ForceSave {
|
||||
err = DatasourceCheck(req)
|
||||
if err != nil {
|
||||
Dangerous(c, err)
|
||||
return
|
||||
}
|
||||
}
|
||||
|
||||
if req.Id == 0 {
|
||||
|
||||
@@ -45,6 +45,10 @@ func (rt *Router) statistic(c *gin.Context) {
|
||||
statistics, err = models.ConfigsUserVariableStatistics(rt.Ctx)
|
||||
ginx.NewRender(c).Data(statistics, err)
|
||||
return
|
||||
case "cval":
|
||||
statistics, err = models.ConfigCvalStatistics(rt.Ctx)
|
||||
ginx.NewRender(c).Data(statistics, err)
|
||||
return
|
||||
default:
|
||||
ginx.Bomb(http.StatusBadRequest, "invalid name")
|
||||
}
|
||||
@@ -65,6 +69,23 @@ func queryDatasourceIds(c *gin.Context) []int64 {
|
||||
return ids
|
||||
}
|
||||
|
||||
func queryStrListField(c *gin.Context, fieldName string, sep ...string) []string {
|
||||
str := ginx.QueryStr(c, fieldName, "")
|
||||
if str == "" {
|
||||
return nil
|
||||
}
|
||||
|
||||
lst := []string{str}
|
||||
for _, s := range sep {
|
||||
var newLst []string
|
||||
for _, str := range lst {
|
||||
newLst = append(newLst, strings.Split(str, s)...)
|
||||
}
|
||||
lst = newLst
|
||||
}
|
||||
return lst
|
||||
}
|
||||
|
||||
type idsForm struct {
|
||||
Ids []int64 `json:"ids"`
|
||||
IsSyncToFlashDuty bool `json:"is_sync_to_flashduty"`
|
||||
|
||||
@@ -3,19 +3,35 @@ package router
|
||||
import (
|
||||
"compress/gzip"
|
||||
"encoding/json"
|
||||
"errors"
|
||||
"io/ioutil"
|
||||
"sort"
|
||||
"strconv"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/center/metas"
|
||||
"github.com/ccfos/nightingale/v6/memsto"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
"github.com/ccfos/nightingale/v6/pushgw/idents"
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
"github.com/toolkits/pkg/ginx"
|
||||
"github.com/toolkits/pkg/logger"
|
||||
)
|
||||
|
||||
type HeartbeatHookFunc func(ident string) map[string]interface{}
|
||||
|
||||
func (rt *Router) heartbeat(c *gin.Context) {
|
||||
req, err := HandleHeartbeat(c, rt.Ctx, rt.Alert.Heartbeat.EngineName, rt.MetaSet, rt.IdentSet, rt.TargetCache)
|
||||
ginx.Dangerous(err)
|
||||
|
||||
m := rt.HeartbeatHook(req.Hostname)
|
||||
ginx.NewRender(c).Data(m, err)
|
||||
}
|
||||
|
||||
func HandleHeartbeat(c *gin.Context, ctx *ctx.Context, engineName string, metaSet *metas.Set, identSet *idents.Set, targetCache *memsto.TargetCacheType) (models.HostMeta, error) {
|
||||
var bs []byte
|
||||
var err error
|
||||
var r *gzip.Reader
|
||||
@@ -24,7 +40,7 @@ func (rt *Router) heartbeat(c *gin.Context) {
|
||||
r, err = gzip.NewReader(c.Request.Body)
|
||||
if err != nil {
|
||||
c.String(400, err.Error())
|
||||
return
|
||||
return req, err
|
||||
}
|
||||
defer r.Close()
|
||||
bs, err = ioutil.ReadAll(r)
|
||||
@@ -32,11 +48,19 @@ func (rt *Router) heartbeat(c *gin.Context) {
|
||||
} else {
|
||||
defer c.Request.Body.Close()
|
||||
bs, err = ioutil.ReadAll(c.Request.Body)
|
||||
ginx.Dangerous(err)
|
||||
if err != nil {
|
||||
return req, err
|
||||
}
|
||||
}
|
||||
|
||||
err = json.Unmarshal(bs, &req)
|
||||
ginx.Dangerous(err)
|
||||
if err != nil {
|
||||
return req, err
|
||||
}
|
||||
|
||||
if req.Hostname == "" {
|
||||
return req, errors.New("hostname is required")
|
||||
}
|
||||
|
||||
// maybe from pushgw
|
||||
if req.Offset == 0 {
|
||||
@@ -48,51 +72,132 @@ func (rt *Router) heartbeat(c *gin.Context) {
|
||||
}
|
||||
|
||||
if req.EngineName == "" {
|
||||
req.EngineName = rt.Alert.Heartbeat.EngineName
|
||||
req.EngineName = engineName
|
||||
}
|
||||
|
||||
rt.MetaSet.Set(req.Hostname, req)
|
||||
metaSet.Set(req.Hostname, req)
|
||||
var items = make(map[string]struct{})
|
||||
items[req.Hostname] = struct{}{}
|
||||
rt.IdentSet.MSet(items)
|
||||
identSet.MSet(items)
|
||||
|
||||
if target, has := rt.TargetCache.Get(req.Hostname); has && target != nil {
|
||||
gid := ginx.QueryInt64(c, "gid", 0)
|
||||
if target, has := targetCache.Get(req.Hostname); has && target != nil {
|
||||
gidsStr := ginx.QueryStr(c, "gid", "")
|
||||
overwriteGids := ginx.QueryBool(c, "overwrite_gids", false)
|
||||
hostIp := strings.TrimSpace(req.HostIp)
|
||||
gids := strings.Split(gidsStr, ",")
|
||||
|
||||
filed := make(map[string]interface{})
|
||||
if gid != 0 && gid != target.GroupId {
|
||||
filed["group_id"] = gid
|
||||
if overwriteGids {
|
||||
groupIds := make([]int64, 0)
|
||||
for i := range gids {
|
||||
if gids[i] == "" {
|
||||
continue
|
||||
}
|
||||
groupId, err := strconv.ParseInt(gids[i], 10, 64)
|
||||
if err != nil {
|
||||
logger.Warningf("update target:%s group ids failed, err: %v", req.Hostname, err)
|
||||
continue
|
||||
}
|
||||
groupIds = append(groupIds, groupId)
|
||||
}
|
||||
|
||||
err := models.TargetOverrideBgids(ctx, []string{target.Ident}, groupIds)
|
||||
if err != nil {
|
||||
logger.Warningf("update target:%s group ids failed, err: %v", target.Ident, err)
|
||||
}
|
||||
} else if gidsStr != "" {
|
||||
for i := range gids {
|
||||
groupId, err := strconv.ParseInt(gids[i], 10, 64)
|
||||
if err != nil {
|
||||
logger.Warningf("update target:%s group ids failed, err: %v", req.Hostname, err)
|
||||
continue
|
||||
}
|
||||
|
||||
if !target.MatchGroupId(groupId) {
|
||||
err := models.TargetBindBgids(ctx, []string{target.Ident}, []int64{groupId})
|
||||
if err != nil {
|
||||
logger.Warningf("update target:%s group ids failed, err: %v", target.Ident, err)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
newTarget := models.Target{}
|
||||
targetNeedUpdate := false
|
||||
if hostIp != "" && hostIp != target.HostIp {
|
||||
filed["host_ip"] = hostIp
|
||||
newTarget.HostIp = hostIp
|
||||
targetNeedUpdate = true
|
||||
}
|
||||
|
||||
if len(req.GlobalLabels) > 0 {
|
||||
hostTagsMap := target.GetHostTagsMap()
|
||||
hostTagNeedUpdate := false
|
||||
if len(hostTagsMap) != len(req.GlobalLabels) {
|
||||
hostTagNeedUpdate = true
|
||||
} else {
|
||||
for k, v := range req.GlobalLabels {
|
||||
if v == "" {
|
||||
continue
|
||||
}
|
||||
|
||||
if tagv, ok := hostTagsMap[k]; !ok || tagv != v {
|
||||
hostTagNeedUpdate = true
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if hostTagNeedUpdate {
|
||||
lst := []string{}
|
||||
for k, v := range req.GlobalLabels {
|
||||
lst = append(lst, k+"="+v)
|
||||
}
|
||||
sort.Strings(lst)
|
||||
labels := strings.Join(lst, " ")
|
||||
if target.Tags != labels {
|
||||
filed["tags"] = labels
|
||||
newTarget.HostTags = lst
|
||||
targetNeedUpdate = true
|
||||
}
|
||||
|
||||
userTagsMap := target.GetTagsMap()
|
||||
userTagNeedUpdate := false
|
||||
userTags := []string{}
|
||||
for k, v := range userTagsMap {
|
||||
if v == "" {
|
||||
continue
|
||||
}
|
||||
|
||||
if _, ok := req.GlobalLabels[k]; !ok {
|
||||
userTags = append(userTags, k+"="+v)
|
||||
} else { // 该key在hostTags中已经存在
|
||||
userTagNeedUpdate = true
|
||||
}
|
||||
}
|
||||
|
||||
if req.EngineName != "" && req.EngineName != target.EngineName {
|
||||
filed["engine_name"] = req.EngineName
|
||||
if userTagNeedUpdate {
|
||||
newTarget.Tags = strings.Join(userTags, " ") + " "
|
||||
targetNeedUpdate = true
|
||||
}
|
||||
|
||||
if len(filed) > 0 {
|
||||
err := target.UpdateFieldsMap(rt.Ctx, filed)
|
||||
if req.EngineName != "" && req.EngineName != target.EngineName {
|
||||
newTarget.EngineName = req.EngineName
|
||||
targetNeedUpdate = true
|
||||
}
|
||||
|
||||
if req.AgentVersion != "" && req.AgentVersion != target.AgentVersion {
|
||||
newTarget.AgentVersion = req.AgentVersion
|
||||
targetNeedUpdate = true
|
||||
}
|
||||
|
||||
if req.OS != "" && req.OS != target.OS {
|
||||
newTarget.OS = req.OS
|
||||
targetNeedUpdate = true
|
||||
}
|
||||
|
||||
if targetNeedUpdate {
|
||||
err := models.DB(ctx).Model(&target).Updates(newTarget).Error
|
||||
if err != nil {
|
||||
logger.Errorf("update target fields failed, err: %v", err)
|
||||
}
|
||||
}
|
||||
logger.Debugf("heartbeat field:%+v target: %v", filed, *target)
|
||||
logger.Debugf("heartbeat field:%+v target: %v", newTarget, *target)
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Message(err)
|
||||
return req, nil
|
||||
}
|
||||
|
||||
@@ -55,12 +55,12 @@ func (rt *Router) loginPost(c *gin.Context) {
|
||||
var err error
|
||||
lc := rt.Sso.LDAP.Copy()
|
||||
if lc.Enable {
|
||||
user, err = ldapx.LdapLogin(rt.Ctx, f.Username, authPassWord, lc.DefaultRoles, lc)
|
||||
user, err = ldapx.LdapLogin(rt.Ctx, f.Username, authPassWord, lc.DefaultRoles, lc.DefaultTeams, lc)
|
||||
if err != nil {
|
||||
logger.Debugf("ldap login failed: %v username: %s", err, f.Username)
|
||||
var errLoginInN9e error
|
||||
// to use n9e as the minimum guarantee for login
|
||||
if user, errLoginInN9e = models.PassLogin(rt.Ctx, f.Username, authPassWord); errLoginInN9e != nil {
|
||||
if user, errLoginInN9e = models.PassLogin(rt.Ctx, rt.Redis, f.Username, authPassWord); errLoginInN9e != nil {
|
||||
ginx.NewRender(c).Message("ldap login failed: %v; n9e login failed: %v", err, errLoginInN9e)
|
||||
return
|
||||
}
|
||||
@@ -68,7 +68,7 @@ func (rt *Router) loginPost(c *gin.Context) {
|
||||
user.RolesLst = strings.Fields(user.Roles)
|
||||
}
|
||||
} else {
|
||||
user, err = models.PassLogin(rt.Ctx, f.Username, authPassWord)
|
||||
user, err = models.PassLogin(rt.Ctx, rt.Redis, f.Username, authPassWord)
|
||||
ginx.Dangerous(err)
|
||||
}
|
||||
|
||||
@@ -262,6 +262,15 @@ func (rt *Router) loginCallback(c *gin.Context) {
|
||||
user.FullSsoFields("oidc", ret.Username, ret.Nickname, ret.Phone, ret.Email, rt.Sso.OIDC.DefaultRoles)
|
||||
// create user from oidc
|
||||
ginx.Dangerous(user.Add(rt.Ctx))
|
||||
|
||||
if len(rt.Sso.OIDC.DefaultTeams) > 0 {
|
||||
for _, gid := range rt.Sso.OIDC.DefaultTeams {
|
||||
err = models.UserGroupMemberAdd(rt.Ctx, gid, user.Id)
|
||||
if err != nil {
|
||||
logger.Errorf("user:%v UserGroupMemberAdd: %s", user, err)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// set user login state
|
||||
|
||||
@@ -50,7 +50,8 @@ func (rt *Router) alertMuteGets(c *gin.Context) {
|
||||
prods := strings.Fields(ginx.QueryStr(c, "prods", ""))
|
||||
bgid := ginx.QueryInt64(c, "bgid", -1)
|
||||
query := ginx.QueryStr(c, "query", "")
|
||||
lst, err := models.AlertMuteGets(rt.Ctx, prods, bgid, query)
|
||||
disabled := ginx.QueryInt(c, "disabled", -1)
|
||||
lst, err := models.AlertMuteGets(rt.Ctx, prods, bgid, disabled, query)
|
||||
|
||||
ginx.NewRender(c).Data(lst, err)
|
||||
}
|
||||
@@ -95,7 +96,8 @@ func (rt *Router) alertMuteAddByService(c *gin.Context) {
|
||||
var f models.AlertMute
|
||||
ginx.BindJSON(c, &f)
|
||||
|
||||
ginx.NewRender(c).Message(f.Add(rt.Ctx))
|
||||
err := f.Add(rt.Ctx)
|
||||
ginx.NewRender(c).Data(f.Id, err)
|
||||
}
|
||||
|
||||
func (rt *Router) alertMuteDel(c *gin.Context) {
|
||||
|
||||
205
center/router/router_notification_record.go
Normal file
205
center/router/router_notification_record.go
Normal file
@@ -0,0 +1,205 @@
|
||||
package router
|
||||
|
||||
import (
|
||||
"strings"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
"github.com/toolkits/pkg/ginx"
|
||||
"github.com/toolkits/pkg/logger"
|
||||
)
|
||||
|
||||
type NotificationResponse struct {
|
||||
SubRules []SubRule `json:"sub_rules"`
|
||||
Notifies map[string][]Record `json:"notifies"`
|
||||
}
|
||||
|
||||
type SubRule struct {
|
||||
SubID int64 `json:"sub_id"`
|
||||
Notifies map[string][]Record `json:"notifies"`
|
||||
}
|
||||
|
||||
type Notify struct {
|
||||
Channel string `json:"channel"`
|
||||
Records []Record `json:"records"`
|
||||
}
|
||||
|
||||
type Record struct {
|
||||
Target string `json:"target"`
|
||||
Username string `json:"username"`
|
||||
Status int `json:"status"`
|
||||
Detail string `json:"detail"`
|
||||
}
|
||||
|
||||
// notificationRecordAdd
|
||||
func (rt *Router) notificationRecordAdd(c *gin.Context) {
|
||||
var req models.NotificaitonRecord
|
||||
ginx.BindJSON(c, &req)
|
||||
err := req.Add(rt.Ctx)
|
||||
|
||||
ginx.NewRender(c).Data(req.Id, err)
|
||||
}
|
||||
|
||||
func (rt *Router) notificationRecordList(c *gin.Context) {
|
||||
eid := ginx.UrlParamInt64(c, "eid")
|
||||
lst, err := models.NotificaitonRecordsGetByEventId(rt.Ctx, eid)
|
||||
ginx.Dangerous(err)
|
||||
|
||||
response := buildNotificationResponse(rt.Ctx, lst)
|
||||
ginx.NewRender(c).Data(response, nil)
|
||||
}
|
||||
|
||||
func buildNotificationResponse(ctx *ctx.Context, nl []*models.NotificaitonRecord) NotificationResponse {
|
||||
response := NotificationResponse{
|
||||
SubRules: []SubRule{},
|
||||
Notifies: make(map[string][]Record),
|
||||
}
|
||||
|
||||
subRuleMap := make(map[int64]*SubRule)
|
||||
|
||||
// Collect all group IDs
|
||||
groupIdSet := make(map[int64]struct{})
|
||||
|
||||
// map[SubId]map[Channel]map[Target]index
|
||||
filter := make(map[int64]map[string]map[string]int)
|
||||
|
||||
for i, n := range nl {
|
||||
// 对相同的 channel-target 进行合并
|
||||
for _, gid := range n.GetGroupIds(ctx) {
|
||||
groupIdSet[gid] = struct{}{}
|
||||
}
|
||||
|
||||
if _, exists := filter[n.SubId]; !exists {
|
||||
filter[n.SubId] = make(map[string]map[string]int)
|
||||
}
|
||||
|
||||
if _, exists := filter[n.SubId][n.Channel]; !exists {
|
||||
filter[n.SubId][n.Channel] = make(map[string]int)
|
||||
}
|
||||
|
||||
idx, exists := filter[n.SubId][n.Channel][n.Target]
|
||||
if !exists {
|
||||
filter[n.SubId][n.Channel][n.Target] = i
|
||||
} else {
|
||||
if nl[idx].Status < n.Status {
|
||||
nl[idx].Status = n.Status
|
||||
}
|
||||
nl[idx].Details = nl[idx].Details + ", " + n.Details
|
||||
nl[i] = nil
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
// Fill usernames only once
|
||||
usernameByTarget := fillUserNames(ctx, groupIdSet)
|
||||
|
||||
for _, n := range nl {
|
||||
if n == nil {
|
||||
continue
|
||||
}
|
||||
|
||||
m := usernameByTarget[n.Target]
|
||||
usernames := make([]string, 0, len(m))
|
||||
for k := range m {
|
||||
usernames = append(usernames, k)
|
||||
}
|
||||
|
||||
if !checkChannel(n.Channel) {
|
||||
// Hide sensitive information
|
||||
n.Target = replaceLastEightChars(n.Target)
|
||||
}
|
||||
record := Record{
|
||||
Target: n.Target,
|
||||
Status: n.Status,
|
||||
Detail: n.Details,
|
||||
}
|
||||
|
||||
record.Username = strings.Join(usernames, ",")
|
||||
|
||||
if n.SubId > 0 {
|
||||
// Handle SubRules
|
||||
subRule, ok := subRuleMap[n.SubId]
|
||||
if !ok {
|
||||
newSubRule := &SubRule{
|
||||
SubID: n.SubId,
|
||||
}
|
||||
newSubRule.Notifies = make(map[string][]Record)
|
||||
newSubRule.Notifies[n.Channel] = []Record{record}
|
||||
|
||||
subRuleMap[n.SubId] = newSubRule
|
||||
} else {
|
||||
if _, exists := subRule.Notifies[n.Channel]; !exists {
|
||||
|
||||
subRule.Notifies[n.Channel] = []Record{record}
|
||||
} else {
|
||||
subRule.Notifies[n.Channel] = append(subRule.Notifies[n.Channel], record)
|
||||
}
|
||||
}
|
||||
continue
|
||||
}
|
||||
|
||||
if response.Notifies == nil {
|
||||
response.Notifies = make(map[string][]Record)
|
||||
}
|
||||
|
||||
if _, exists := response.Notifies[n.Channel]; !exists {
|
||||
response.Notifies[n.Channel] = []Record{record}
|
||||
} else {
|
||||
response.Notifies[n.Channel] = append(response.Notifies[n.Channel], record)
|
||||
}
|
||||
}
|
||||
|
||||
for _, subRule := range subRuleMap {
|
||||
response.SubRules = append(response.SubRules, *subRule)
|
||||
}
|
||||
|
||||
return response
|
||||
}
|
||||
|
||||
// check channel is one of the following: tx-sms, tx-voice, ali-sms, ali-voice, email, script
|
||||
func checkChannel(channel string) bool {
|
||||
switch channel {
|
||||
case "tx-sms", "tx-voice", "ali-sms", "ali-voice", "email", "script":
|
||||
return true
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
func replaceLastEightChars(s string) string {
|
||||
if len(s) <= 8 {
|
||||
return strings.Repeat("*", len(s))
|
||||
}
|
||||
return s[:len(s)-8] + strings.Repeat("*", 8)
|
||||
}
|
||||
|
||||
func fillUserNames(ctx *ctx.Context, groupIdSet map[int64]struct{}) map[string]map[string]struct{} {
|
||||
userNameByTarget := make(map[string]map[string]struct{})
|
||||
|
||||
gids := make([]int64, 0, len(groupIdSet))
|
||||
for gid := range groupIdSet {
|
||||
gids = append(gids, gid)
|
||||
}
|
||||
|
||||
users, err := models.UsersGetByGroupIds(ctx, gids)
|
||||
if err != nil {
|
||||
logger.Errorf("UsersGetByGroupIds failed, err: %v", err)
|
||||
return userNameByTarget
|
||||
}
|
||||
|
||||
for _, user := range users {
|
||||
logger.Warningf("user: %s", user.Username)
|
||||
for _, ch := range models.DefaultChannels {
|
||||
target, exist := user.ExtractToken(ch)
|
||||
if exist {
|
||||
if _, ok := userNameByTarget[target]; !ok {
|
||||
userNameByTarget[target] = make(map[string]struct{})
|
||||
}
|
||||
userNameByTarget[target][user.Username] = struct{}{}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return userNameByTarget
|
||||
}
|
||||
@@ -90,7 +90,8 @@ func (rt *Router) notifyChannelPuts(c *gin.Context) {
|
||||
var notifyChannels []models.NotifyChannel
|
||||
ginx.BindJSON(c, ¬ifyChannels)
|
||||
|
||||
channels := []string{models.Dingtalk, models.Wecom, models.Feishu, models.Mm, models.Telegram, models.Email}
|
||||
channels := []string{models.Dingtalk, models.Wecom, models.Feishu, models.Mm, models.Telegram,
|
||||
models.Email, models.Lark, models.LarkCard}
|
||||
|
||||
m := make(map[string]struct{})
|
||||
for _, v := range notifyChannels {
|
||||
@@ -126,7 +127,8 @@ func (rt *Router) notifyContactPuts(c *gin.Context) {
|
||||
var notifyContacts []models.NotifyContact
|
||||
ginx.BindJSON(c, ¬ifyContacts)
|
||||
|
||||
keys := []string{models.DingtalkKey, models.WecomKey, models.FeishuKey, models.MmKey, models.TelegramKey}
|
||||
keys := []string{models.DingtalkKey, models.WecomKey, models.FeishuKey, models.MmKey,
|
||||
models.TelegramKey, models.LarkKey}
|
||||
|
||||
m := make(map[string]struct{})
|
||||
for _, v := range notifyContacts {
|
||||
@@ -180,7 +182,7 @@ func (rt *Router) notifyConfigPut(c *gin.Context) {
|
||||
|
||||
smtp, errSmtp := SmtpValidate(text)
|
||||
ginx.Dangerous(errSmtp)
|
||||
go sender.RestartEmailSender(smtp)
|
||||
go sender.RestartEmailSender(rt.Ctx, smtp)
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Message(nil)
|
||||
|
||||
@@ -68,7 +68,7 @@ func (rt *Router) notifyTplUpdate(c *gin.Context) {
|
||||
}
|
||||
|
||||
// get the count of the same channel and name but different id
|
||||
count, err := models.Count(models.DB(rt.Ctx).Model(&models.NotifyTpl{}).Where("channel = ? or name = ? and id <> ?", f.Channel, f.Name, f.Id))
|
||||
count, err := models.Count(models.DB(rt.Ctx).Model(&models.NotifyTpl{}).Where("(channel = ? or name = ?) and id <> ?", f.Channel, f.Name, f.Id))
|
||||
ginx.Dangerous(err)
|
||||
if count != 0 {
|
||||
ginx.Bomb(200, "Refuse to create duplicate channel or name")
|
||||
@@ -138,7 +138,7 @@ func (rt *Router) notifyTplPreview(c *gin.Context) {
|
||||
continue
|
||||
}
|
||||
|
||||
arr := strings.Split(pair, "=")
|
||||
arr := strings.SplitN(pair, "=", 2)
|
||||
if len(arr) != 2 {
|
||||
continue
|
||||
}
|
||||
|
||||
@@ -4,9 +4,11 @@ import (
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/flashduty"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ormx"
|
||||
"github.com/ccfos/nightingale/v6/pkg/secu"
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
"github.com/toolkits/pkg/ginx"
|
||||
"github.com/toolkits/pkg/logger"
|
||||
)
|
||||
|
||||
func (rt *Router) selfProfileGet(c *gin.Context) {
|
||||
@@ -58,5 +60,25 @@ func (rt *Router) selfPasswordPut(c *gin.Context) {
|
||||
var f selfPasswordForm
|
||||
ginx.BindJSON(c, &f)
|
||||
user := c.MustGet("user").(*models.User)
|
||||
ginx.NewRender(c).Message(user.ChangePassword(rt.Ctx, f.OldPass, f.NewPass))
|
||||
|
||||
newPassWord := f.NewPass
|
||||
oldPassWord := f.OldPass
|
||||
if rt.HTTP.RSA.OpenRSA {
|
||||
var err error
|
||||
newPassWord, err = secu.Decrypt(f.NewPass, rt.HTTP.RSA.RSAPrivateKey, rt.HTTP.RSA.RSAPassWord)
|
||||
if err != nil {
|
||||
logger.Errorf("RSA Decrypt failed: %v username: %s", err, user.Username)
|
||||
ginx.NewRender(c).Message(err)
|
||||
return
|
||||
}
|
||||
|
||||
oldPassWord, err = secu.Decrypt(f.OldPass, rt.HTTP.RSA.RSAPrivateKey, rt.HTTP.RSA.RSAPassWord)
|
||||
if err != nil {
|
||||
logger.Errorf("RSA Decrypt failed: %v username: %s", err, user.Username)
|
||||
ginx.NewRender(c).Message(err)
|
||||
return
|
||||
}
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Message(user.ChangePassword(rt.Ctx, oldPassWord, newPassWord))
|
||||
}
|
||||
|
||||
@@ -49,6 +49,11 @@ func (rt *Router) targetGets(c *gin.Context) {
|
||||
downtime := ginx.QueryInt64(c, "downtime", 0)
|
||||
dsIds := queryDatasourceIds(c)
|
||||
|
||||
order := ginx.QueryStr(c, "order", "ident")
|
||||
desc := ginx.QueryBool(c, "desc", false)
|
||||
|
||||
hosts := queryStrListField(c, "hosts", ",", " ", "\n")
|
||||
|
||||
var err error
|
||||
if len(bgids) == 0 {
|
||||
user := c.MustGet("user").(*models.User)
|
||||
@@ -63,12 +68,27 @@ func (rt *Router) targetGets(c *gin.Context) {
|
||||
}
|
||||
}
|
||||
|
||||
total, err := models.TargetTotal(rt.Ctx, bgids, dsIds, query, downtime)
|
||||
options := []models.BuildTargetWhereOption{
|
||||
models.BuildTargetWhereWithBgids(bgids),
|
||||
models.BuildTargetWhereWithDsIds(dsIds),
|
||||
models.BuildTargetWhereWithQuery(query),
|
||||
models.BuildTargetWhereWithDowntime(downtime),
|
||||
models.BuildTargetWhereWithHosts(hosts),
|
||||
}
|
||||
total, err := models.TargetTotal(rt.Ctx, options...)
|
||||
ginx.Dangerous(err)
|
||||
|
||||
list, err := models.TargetGets(rt.Ctx, bgids, dsIds, query, downtime, limit, ginx.Offset(c, limit))
|
||||
list, err := models.TargetGets(rt.Ctx, limit,
|
||||
ginx.Offset(c, limit), order, desc, options...)
|
||||
ginx.Dangerous(err)
|
||||
|
||||
tgs, err := models.TargetBusiGroupsGetAll(rt.Ctx)
|
||||
ginx.Dangerous(err)
|
||||
|
||||
for _, t := range list {
|
||||
t.GroupIds = tgs[t.Ident]
|
||||
}
|
||||
|
||||
if err == nil {
|
||||
now := time.Now()
|
||||
cache := make(map[int64]*models.BusiGroup)
|
||||
@@ -148,7 +168,8 @@ func (rt *Router) targetGetsByService(c *gin.Context) {
|
||||
func (rt *Router) targetGetTags(c *gin.Context) {
|
||||
idents := ginx.QueryStr(c, "idents", "")
|
||||
idents = strings.ReplaceAll(idents, ",", " ")
|
||||
lst, err := models.TargetGetTags(rt.Ctx, strings.Fields(idents))
|
||||
ignoreHostTag := ginx.QueryBool(c, "ignore_host_tag", false)
|
||||
lst, err := models.TargetGetTags(rt.Ctx, strings.Fields(idents), ignoreHostTag)
|
||||
ginx.NewRender(c).Data(lst, err)
|
||||
}
|
||||
|
||||
@@ -251,9 +272,11 @@ func (rt *Router) validateTags(tags []string) error {
|
||||
}
|
||||
|
||||
func (rt *Router) addTagsToTarget(target *models.Target, tags []string) error {
|
||||
hostTagsMap := target.GetHostTagsMap()
|
||||
for _, tag := range tags {
|
||||
tagKey := strings.Split(tag, "=")[0]
|
||||
if strings.Contains(target.Tags, tagKey+"=") {
|
||||
if _, ok := hostTagsMap[tagKey]; ok ||
|
||||
strings.Contains(target.Tags, tagKey+"=") {
|
||||
return fmt.Errorf("duplicate tagkey(%s)", tagKey)
|
||||
}
|
||||
}
|
||||
@@ -370,8 +393,15 @@ type targetBgidForm struct {
|
||||
Bgid int64 `json:"bgid"`
|
||||
}
|
||||
|
||||
func (rt *Router) targetUpdateBgid(c *gin.Context) {
|
||||
var f targetBgidForm
|
||||
type targetBgidsForm struct {
|
||||
Idents []string `json:"idents" binding:"required_without=HostIps"`
|
||||
HostIps []string `json:"host_ips" binding:"required_without=Idents"`
|
||||
Bgids []int64 `json:"bgids"`
|
||||
Action string `json:"action"` // add del reset
|
||||
}
|
||||
|
||||
func (rt *Router) targetBindBgids(c *gin.Context) {
|
||||
var f targetBgidsForm
|
||||
var err error
|
||||
var failedResults = make(map[string]string)
|
||||
ginx.BindJSON(c, &f)
|
||||
@@ -387,35 +417,24 @@ func (rt *Router) targetUpdateBgid(c *gin.Context) {
|
||||
}
|
||||
|
||||
user := c.MustGet("user").(*models.User)
|
||||
if user.IsAdmin() {
|
||||
ginx.NewRender(c).Data(failedResults, models.TargetUpdateBgid(rt.Ctx, f.Idents, f.Bgid, false))
|
||||
return
|
||||
}
|
||||
|
||||
if f.Bgid > 0 {
|
||||
// 把要操作的机器分成两部分,一部分是bgid为0,需要管理员分配,另一部分bgid>0,说明是业务组内部想调整
|
||||
// 比如原来分配给didiyun的机器,didiyun的管理员想把部分机器调整到didiyun-ceph下
|
||||
// 对于调整的这种情况,当前登录用户要对这批机器有操作权限,同时还要对目标BG有操作权限
|
||||
orphans, err := models.IdentsFilter(rt.Ctx, f.Idents, "group_id = ?", 0)
|
||||
if !user.IsAdmin() {
|
||||
// 普通用户,检查用户是否有权限操作所有请求的业务组
|
||||
existing, _, err := models.SeparateTargetIdents(rt.Ctx, f.Idents)
|
||||
ginx.Dangerous(err)
|
||||
rt.checkTargetPerm(c, existing)
|
||||
|
||||
// 机器里边存在未归组的,登录用户就需要是admin
|
||||
if len(orphans) > 0 && !user.IsAdmin() {
|
||||
can, err := user.CheckPerm(rt.Ctx, "/targets/bind")
|
||||
var groupIds []int64
|
||||
if f.Action == "reset" {
|
||||
// 如果是复写,则需要检查用户是否有权限操作机器之前的业务组
|
||||
bgids, err := models.TargetGroupIdsGetByIdents(rt.Ctx, f.Idents)
|
||||
ginx.Dangerous(err)
|
||||
if !can {
|
||||
ginx.Bomb(http.StatusForbidden, "No permission. Only admin can assign BG")
|
||||
}
|
||||
|
||||
groupIds = append(groupIds, bgids...)
|
||||
}
|
||||
groupIds = append(groupIds, f.Bgids...)
|
||||
|
||||
reBelongs, err := models.IdentsFilter(rt.Ctx, f.Idents, "group_id > ?", 0)
|
||||
ginx.Dangerous(err)
|
||||
|
||||
if len(reBelongs) > 0 {
|
||||
// 对于这些要重新分配的机器,操作者要对这些机器本身有权限,同时要对目标bgid有权限
|
||||
rt.checkTargetPerm(c, f.Idents)
|
||||
|
||||
bg := BusiGroup(rt.Ctx, f.Bgid)
|
||||
for _, bgid := range groupIds {
|
||||
bg := BusiGroup(rt.Ctx, bgid)
|
||||
can, err := user.CanDoBusiGroup(rt.Ctx, bg, "rw")
|
||||
ginx.Dangerous(err)
|
||||
|
||||
@@ -423,14 +442,24 @@ func (rt *Router) targetUpdateBgid(c *gin.Context) {
|
||||
ginx.Bomb(http.StatusForbidden, "No permission. You are not admin of BG(%s)", bg.Name)
|
||||
}
|
||||
}
|
||||
} else if f.Bgid == 0 {
|
||||
// 退还机器
|
||||
rt.checkTargetPerm(c, f.Idents)
|
||||
} else {
|
||||
ginx.Bomb(http.StatusBadRequest, "invalid bgid")
|
||||
|
||||
can, err := user.CheckPerm(rt.Ctx, "/targets/bind")
|
||||
ginx.Dangerous(err)
|
||||
if !can {
|
||||
ginx.Bomb(http.StatusForbidden, "No permission. Only admin can assign BG")
|
||||
}
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Data(failedResults, models.TargetUpdateBgid(rt.Ctx, f.Idents, f.Bgid, false))
|
||||
switch f.Action {
|
||||
case "add":
|
||||
ginx.NewRender(c).Data(failedResults, models.TargetBindBgids(rt.Ctx, f.Idents, f.Bgids))
|
||||
case "del":
|
||||
ginx.NewRender(c).Data(failedResults, models.TargetUnbindBgids(rt.Ctx, f.Idents, f.Bgids))
|
||||
case "reset":
|
||||
ginx.NewRender(c).Data(failedResults, models.TargetOverrideBgids(rt.Ctx, f.Idents, f.Bgids))
|
||||
default:
|
||||
ginx.Bomb(http.StatusBadRequest, "invalid action")
|
||||
}
|
||||
}
|
||||
|
||||
func (rt *Router) targetUpdateBgidByService(c *gin.Context) {
|
||||
@@ -449,7 +478,7 @@ func (rt *Router) targetUpdateBgidByService(c *gin.Context) {
|
||||
ginx.Bomb(http.StatusBadRequest, err.Error())
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Data(failedResults, models.TargetUpdateBgid(rt.Ctx, f.Idents, f.Bgid, false))
|
||||
ginx.NewRender(c).Data(failedResults, models.TargetOverrideBgids(rt.Ctx, f.Idents, []int64{f.Bgid}))
|
||||
}
|
||||
|
||||
type identsForm struct {
|
||||
@@ -473,7 +502,7 @@ func (rt *Router) targetDel(c *gin.Context) {
|
||||
ginx.Bomb(http.StatusBadRequest, err.Error())
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Data(failedResults, models.TargetDel(rt.Ctx, f.Idents))
|
||||
ginx.NewRender(c).Data(failedResults, models.TargetDel(rt.Ctx, f.Idents, rt.TargetDeleteHook))
|
||||
}
|
||||
|
||||
func (rt *Router) targetDelByService(c *gin.Context) {
|
||||
@@ -492,7 +521,7 @@ func (rt *Router) targetDelByService(c *gin.Context) {
|
||||
ginx.Bomb(http.StatusBadRequest, err.Error())
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Data(failedResults, models.TargetDel(rt.Ctx, f.Idents))
|
||||
ginx.NewRender(c).Data(failedResults, models.TargetDel(rt.Ctx, f.Idents, rt.TargetDeleteHook))
|
||||
}
|
||||
|
||||
func (rt *Router) checkTargetPerm(c *gin.Context, idents []string) {
|
||||
|
||||
@@ -8,6 +8,7 @@ import (
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
"github.com/toolkits/pkg/ginx"
|
||||
"github.com/toolkits/pkg/i18n"
|
||||
"github.com/toolkits/pkg/str"
|
||||
)
|
||||
|
||||
@@ -104,6 +105,11 @@ func (rt *Router) taskRecordAdd(c *gin.Context) {
|
||||
}
|
||||
|
||||
func (rt *Router) taskAdd(c *gin.Context) {
|
||||
if !rt.Ibex.Enable {
|
||||
ginx.Bomb(400, i18n.Sprintf(c.GetHeader("X-Language"), "This functionality has not been enabled. Please contact the system administrator to activate it."))
|
||||
return
|
||||
}
|
||||
|
||||
var f models.TaskForm
|
||||
ginx.BindJSON(c, &f)
|
||||
|
||||
|
||||
@@ -10,6 +10,7 @@ import (
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
"github.com/toolkits/pkg/ginx"
|
||||
"github.com/toolkits/pkg/i18n"
|
||||
"github.com/toolkits/pkg/str"
|
||||
)
|
||||
|
||||
@@ -118,6 +119,11 @@ type taskTplForm struct {
|
||||
}
|
||||
|
||||
func (rt *Router) taskTplAdd(c *gin.Context) {
|
||||
if !rt.Ibex.Enable {
|
||||
ginx.Bomb(400, i18n.Sprintf(c.GetHeader("X-Language"), "This functionality has not been enabled. Please contact the system administrator to activate it."))
|
||||
return
|
||||
}
|
||||
|
||||
var f taskTplForm
|
||||
ginx.BindJSON(c, &f)
|
||||
|
||||
@@ -190,6 +196,13 @@ func (rt *Router) taskTplDel(c *gin.Context) {
|
||||
return
|
||||
}
|
||||
|
||||
ids, err := models.GetAlertRuleIdsByTaskId(rt.Ctx, tid)
|
||||
ginx.Dangerous(err)
|
||||
if len(ids) > 0 {
|
||||
ginx.NewRender(c).Message("can't del this task tpl, used by alert rule ids(%v) ", ids)
|
||||
return
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Message(tpl.Del(rt.Ctx))
|
||||
}
|
||||
|
||||
|
||||
@@ -82,7 +82,7 @@ func (rt *Router) QueryData(c *gin.Context) {
|
||||
var err error
|
||||
tdClient := rt.TdendgineClients.GetCli(f.DatasourceId)
|
||||
for _, q := range f.Querys {
|
||||
datas, err := tdClient.Query(q)
|
||||
datas, err := tdClient.Query(q, 0)
|
||||
ginx.Dangerous(err)
|
||||
resp = append(resp, datas...)
|
||||
}
|
||||
|
||||
@@ -7,9 +7,11 @@ import (
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/flashduty"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ormx"
|
||||
"github.com/ccfos/nightingale/v6/pkg/secu"
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
"github.com/toolkits/pkg/ginx"
|
||||
"github.com/toolkits/pkg/logger"
|
||||
)
|
||||
|
||||
func (rt *Router) userBusiGroupsGets(c *gin.Context) {
|
||||
@@ -46,6 +48,7 @@ func (rt *Router) userGets(c *gin.Context) {
|
||||
order := ginx.QueryStr(c, "order", "username")
|
||||
desc := ginx.QueryBool(c, "desc", false)
|
||||
|
||||
go rt.UserCache.UpdateUsersLastActiveTime()
|
||||
total, err := models.UserTotal(rt.Ctx, query, stime, etime)
|
||||
ginx.Dangerous(err)
|
||||
|
||||
@@ -76,7 +79,18 @@ func (rt *Router) userAddPost(c *gin.Context) {
|
||||
var f userAddForm
|
||||
ginx.BindJSON(c, &f)
|
||||
|
||||
password, err := models.CryptoPass(rt.Ctx, f.Password)
|
||||
authPassWord := f.Password
|
||||
if rt.HTTP.RSA.OpenRSA {
|
||||
decPassWord, err := secu.Decrypt(f.Password, rt.HTTP.RSA.RSAPrivateKey, rt.HTTP.RSA.RSAPassWord)
|
||||
if err != nil {
|
||||
logger.Errorf("RSA Decrypt failed: %v username: %s", err, f.Username)
|
||||
ginx.NewRender(c).Message(err)
|
||||
return
|
||||
}
|
||||
authPassWord = decPassWord
|
||||
}
|
||||
|
||||
password, err := models.CryptoPass(rt.Ctx, authPassWord)
|
||||
ginx.Dangerous(err)
|
||||
|
||||
if len(f.Roles) == 0 {
|
||||
@@ -177,7 +191,18 @@ func (rt *Router) userPasswordPut(c *gin.Context) {
|
||||
|
||||
target := User(rt.Ctx, ginx.UrlParamInt64(c, "id"))
|
||||
|
||||
cryptoPass, err := models.CryptoPass(rt.Ctx, f.Password)
|
||||
authPassWord := f.Password
|
||||
if rt.HTTP.RSA.OpenRSA {
|
||||
decPassWord, err := secu.Decrypt(f.Password, rt.HTTP.RSA.RSAPrivateKey, rt.HTTP.RSA.RSAPassWord)
|
||||
if err != nil {
|
||||
logger.Errorf("RSA Decrypt failed: %v username: %s", err, target.Username)
|
||||
ginx.NewRender(c).Message(err)
|
||||
return
|
||||
}
|
||||
authPassWord = decPassWord
|
||||
}
|
||||
|
||||
cryptoPass, err := models.CryptoPass(rt.Ctx, authPassWord)
|
||||
ginx.Dangerous(err)
|
||||
|
||||
ginx.NewRender(c).Message(target.UpdatePassword(rt.Ctx, cryptoPass, c.MustGet("username").(string)))
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
package sso
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"log"
|
||||
"time"
|
||||
|
||||
@@ -145,7 +146,7 @@ func Init(center cconf.Center, ctx *ctx.Context, configCache *memsto.ConfigCache
|
||||
}
|
||||
}
|
||||
if configCache == nil {
|
||||
logger.Error("configCache is nil, sso initialization failed")
|
||||
log.Fatalln(fmt.Errorf("configCache is nil, sso initialization failed"))
|
||||
}
|
||||
ssoClient.configCache = configCache
|
||||
userVariableMap := configCache.Get()
|
||||
|
||||
@@ -52,11 +52,13 @@ func Initialize(configDir string, cryptoKey string) (func(), error) {
|
||||
|
||||
targetCache := memsto.NewTargetCache(ctx, syncStats, redis)
|
||||
busiGroupCache := memsto.NewBusiGroupCache(ctx, syncStats)
|
||||
configCvalCache := memsto.NewCvalCache(ctx, syncStats)
|
||||
idents := idents.New(ctx, redis)
|
||||
metas := metas.New(redis)
|
||||
writers := writer.NewWriters(config.Pushgw)
|
||||
pushgwRouter := pushgwrt.New(config.HTTP, config.Pushgw, config.Alert, targetCache, busiGroupCache, idents, metas, writers, ctx)
|
||||
r := httpx.GinEngine(config.Global.RunMode, config.HTTP)
|
||||
r := httpx.GinEngine(config.Global.RunMode, config.HTTP, configCvalCache.PrintBodyPaths, configCvalCache.PrintAccessLog)
|
||||
|
||||
pushgwRouter.Config(r)
|
||||
|
||||
if !config.Alert.Disable {
|
||||
|
||||
41
cron/clean_notify_record.go
Normal file
41
cron/clean_notify_record.go
Normal file
@@ -0,0 +1,41 @@
|
||||
package cron
|
||||
|
||||
import (
|
||||
"time"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
|
||||
"github.com/robfig/cron/v3"
|
||||
"github.com/toolkits/pkg/logger"
|
||||
)
|
||||
|
||||
func cleanNotifyRecord(ctx *ctx.Context, day int) {
|
||||
lastWeek := time.Now().Unix() - 86400*int64(day)
|
||||
err := models.DB(ctx).Model(&models.NotificaitonRecord{}).Where("created_at < ?", lastWeek).Delete(&models.NotificaitonRecord{}).Error
|
||||
if err != nil {
|
||||
logger.Errorf("Failed to clean notify record: %v", err)
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
// 每天凌晨1点执行清理任务
|
||||
func CleanNotifyRecord(ctx *ctx.Context, day int) {
|
||||
c := cron.New()
|
||||
if day < 1 {
|
||||
day = 7
|
||||
}
|
||||
|
||||
// 使用cron表达式设置每天凌晨1点执行
|
||||
_, err := c.AddFunc("0 1 * * *", func() {
|
||||
cleanNotifyRecord(ctx, day)
|
||||
})
|
||||
|
||||
if err != nil {
|
||||
logger.Errorf("Failed to add clean notify record cron job: %v", err)
|
||||
return
|
||||
}
|
||||
|
||||
// 启动cron任务
|
||||
c.Start()
|
||||
}
|
||||
@@ -78,6 +78,6 @@ enable = true
|
||||
## ibex flush interval
|
||||
interval = "1000ms"
|
||||
## n9e ibex server rpc address
|
||||
servers = ["ibex:20090"]
|
||||
servers = ["nightingale:20090"]
|
||||
## temp script dir
|
||||
meta_dir = "./meta"
|
||||
|
||||
@@ -8,7 +8,7 @@ CREATE TABLE users (
|
||||
portrait varchar(255) not null default '',
|
||||
roles varchar(255) not null,
|
||||
contacts varchar(1024),
|
||||
maintainer boolean not null default false,
|
||||
maintainer int not null default 0,
|
||||
belong varchar(16) not null default '',
|
||||
last_active_time bigint not null default 0,
|
||||
create_at bigint not null default 0,
|
||||
@@ -60,8 +60,8 @@ CREATE TABLE configs (
|
||||
ckey varchar(191) not null,
|
||||
cval text not null default '',
|
||||
note varchar(1024) not null default '',
|
||||
external boolean not null default false,
|
||||
encrypted boolean not null default false,
|
||||
external int not null default 0,
|
||||
encrypted int not null default 0,
|
||||
create_at bigint not null default 0,
|
||||
create_by varchar(64) not null default '',
|
||||
update_at bigint not null default 0,
|
||||
@@ -378,7 +378,7 @@ COMMENT ON COLUMN alert_mute.disabled IS '0:enabled 1:disabled';
|
||||
CREATE TABLE alert_subscribe (
|
||||
id bigserial,
|
||||
name varchar(255) not null default '',
|
||||
disabled boolean not null default false,
|
||||
disabled int not null default 0,
|
||||
group_id bigint not null default 0,
|
||||
prod varchar(255) not null default '',
|
||||
cate varchar(128) not null,
|
||||
@@ -397,7 +397,7 @@ CREATE TABLE alert_subscribe (
|
||||
rule_ids VARCHAR(1024) DEFAULT '',
|
||||
webhooks text not null,
|
||||
extra_config text not null,
|
||||
redefine_webhooks boolean default false,
|
||||
redefine_webhooks int default 0,
|
||||
for_duration bigint not null default 0,
|
||||
create_at bigint not null default 0,
|
||||
create_by varchar(64) not null default '',
|
||||
@@ -561,7 +561,7 @@ CREATE TABLE alert_cur_event (
|
||||
target_note varchar(191) not null default '' ,
|
||||
first_trigger_time bigint,
|
||||
trigger_time bigint not null,
|
||||
trigger_value varchar(255) not null,
|
||||
trigger_value varchar(2048) not null,
|
||||
annotations text not null ,
|
||||
rule_config text not null ,
|
||||
tags varchar(1024) not null default '' ,
|
||||
@@ -621,7 +621,7 @@ CREATE TABLE alert_his_event (
|
||||
target_note varchar(191) not null default '' ,
|
||||
first_trigger_time bigint,
|
||||
trigger_time bigint not null,
|
||||
trigger_value varchar(255) not null,
|
||||
trigger_value varchar(2048) not null,
|
||||
recover_time bigint not null default 0,
|
||||
last_eval_time bigint not null default 0 ,
|
||||
tags varchar(1024) not null default '' ,
|
||||
@@ -744,7 +744,7 @@ CREATE TABLE datasource
|
||||
status varchar(255) not null default '',
|
||||
http varchar(4096) not null default '',
|
||||
auth varchar(8192) not null default '',
|
||||
is_default smallint not null default 0,
|
||||
is_default boolean not null default false,
|
||||
created_at bigint not null default 0,
|
||||
created_by varchar(64) not null default '',
|
||||
updated_at bigint not null default 0,
|
||||
@@ -845,7 +845,7 @@ CREATE TABLE metric_filter (
|
||||
update_by VARCHAR(191) NOT NULL DEFAULT ''
|
||||
);
|
||||
|
||||
CREATE INDEX idx_name ON metric_filter (name);
|
||||
CREATE INDEX idx_metric_filter_name ON metric_filter (name);
|
||||
|
||||
CREATE TABLE board_busigroup (
|
||||
busi_group_id BIGINT NOT NULL DEFAULT 0,
|
||||
@@ -870,6 +870,7 @@ CREATE INDEX idx_ident ON builtin_components (ident);
|
||||
CREATE TABLE builtin_payloads (
|
||||
id BIGSERIAL PRIMARY KEY,
|
||||
type VARCHAR(191) NOT NULL,
|
||||
uuid BIGINT NOT NULL DEFAULT 0,
|
||||
component VARCHAR(191) NOT NULL,
|
||||
cate VARCHAR(191) NOT NULL,
|
||||
name VARCHAR(191) NOT NULL,
|
||||
@@ -882,6 +883,6 @@ CREATE TABLE builtin_payloads (
|
||||
);
|
||||
|
||||
CREATE INDEX idx_component ON builtin_payloads (component);
|
||||
CREATE INDEX idx_name ON builtin_payloads (name);
|
||||
CREATE INDEX idx_builtin_payloads_name ON builtin_payloads (name);
|
||||
CREATE INDEX idx_cate ON builtin_payloads (cate);
|
||||
CREATE INDEX idx_type ON builtin_payloads (type);
|
||||
@@ -1,11 +1,25 @@
|
||||
#### {{if .IsRecovered}}<font color="#008800">S{{.Severity}} - Recovered - {{.RuleName}}</font>{{else}}<font color="#FF0000">S{{.Severity}} - Triggered - {{.RuleName}}</font>{{end}}
|
||||
#### {{if .IsRecovered}}<font color="#008800">💚{{.RuleName}}</font>{{else}}<font color="#FF0000">💔{{.RuleName}}</font>{{end}}
|
||||
|
||||
---
|
||||
|
||||
- **规则标题**: {{.RuleName}}{{if .RuleNote}}
|
||||
- **规则备注**: {{.RuleNote}}{{end}}
|
||||
- **监控指标**: {{.TagsJSON}}
|
||||
- {{if .IsRecovered}}**恢复时间**:{{timeformat .LastEvalTime}}{{else}}**触发时间**: {{timeformat .TriggerTime}}
|
||||
- **触发时值**: {{.TriggerValue}}{{end}}
|
||||
- **发送时间**: {{timestamp}}
|
||||
|
||||
{{$time_duration := sub now.Unix .FirstTriggerTime }}{{if .IsRecovered}}{{$time_duration = sub .LastEvalTime .FirstTriggerTime }}{{end}}
|
||||
- **告警级别**: {{.Severity}}级
|
||||
{{- if .RuleNote}}
|
||||
- **规则备注**: {{.RuleNote}}
|
||||
{{- end}}
|
||||
{{- if not .IsRecovered}}
|
||||
- **当次触发时值**: {{.TriggerValue}}
|
||||
- **当次触发时间**: {{timeformat .TriggerTime}}
|
||||
- **告警持续时长**: {{humanizeDurationInterface $time_duration}}
|
||||
{{- else}}
|
||||
{{- if .AnnotationsJSON.recovery_value}}
|
||||
- **恢复时值**: {{formatDecimal .AnnotationsJSON.recovery_value 4}}
|
||||
{{- end}}
|
||||
- **恢复时间**: {{timeformat .LastEvalTime}}
|
||||
- **告警持续时长**: {{humanizeDurationInterface $time_duration}}
|
||||
{{- end}}
|
||||
- **告警事件标签**:
|
||||
{{- range $key, $val := .TagsMap}}
|
||||
{{- if ne $key "rulename" }}
|
||||
- `{{$key}}`: `{{$val}}`
|
||||
{{- end}}
|
||||
{{- end}}
|
||||
@@ -1,6 +1,6 @@
|
||||
set names utf8mb4;
|
||||
|
||||
drop database if exists n9e_v6;
|
||||
-- drop database if exists n9e_v6;
|
||||
create database n9e_v6;
|
||||
use n9e_v6;
|
||||
|
||||
@@ -363,9 +363,11 @@ CREATE TABLE `target` (
|
||||
`ident` varchar(191) not null comment 'target id',
|
||||
`note` varchar(255) not null default '' comment 'append to alert event as field',
|
||||
`tags` varchar(512) not null default '' comment 'append to series data as tags, split by space, append external space at suffix',
|
||||
`host_tags` varchar(512) not null default '' comment 'append to series data as tags, split by space, append external space at suffix',
|
||||
`host_ip` varchar(15) default '' COMMENT 'IPv4 string',
|
||||
`agent_version` varchar(255) default '' COMMENT 'agent version',
|
||||
`engine_name` varchar(255) default '' COMMENT 'engine_name',
|
||||
`os` VARCHAR(31) DEFAULT '' COMMENT 'os type',
|
||||
`update_at` bigint not null default 0,
|
||||
PRIMARY KEY (`id`),
|
||||
UNIQUE KEY (`ident`),
|
||||
@@ -397,6 +399,7 @@ CREATE TABLE `recording_rule` (
|
||||
`disabled` tinyint(1) not null default 0 comment '0:enabled 1:disabled',
|
||||
`prom_ql` varchar(8192) not null comment 'promql',
|
||||
`prom_eval_interval` int not null comment 'evaluate interval',
|
||||
`cron_pattern` varchar(255) default '' comment 'cron pattern',
|
||||
`append_tags` varchar(255) default '' comment 'split by space: service=n9e mod=api',
|
||||
`query_configs` text not null comment 'query configs',
|
||||
`create_at` bigint default '0',
|
||||
@@ -440,7 +443,7 @@ CREATE TABLE `alert_cur_event` (
|
||||
`prom_for_duration` int not null comment 'prometheus for, unit:s',
|
||||
`prom_ql` varchar(8192) not null comment 'promql',
|
||||
`prom_eval_interval` int not null comment 'evaluate interval',
|
||||
`callbacks` varchar(255) not null default '' comment 'split by space: http://a.com/api/x http://a.com/api/y',
|
||||
`callbacks` varchar(2048) not null default '' comment 'split by space: http://a.com/api/x http://a.com/api/y',
|
||||
`runbook_url` varchar(255),
|
||||
`notify_recovered` tinyint(1) not null comment 'whether notify when recovery',
|
||||
`notify_channels` varchar(255) not null default '' comment 'split by space: sms voice email dingtalk wecom',
|
||||
@@ -451,10 +454,11 @@ CREATE TABLE `alert_cur_event` (
|
||||
`target_note` varchar(191) not null default '' comment 'target note',
|
||||
`first_trigger_time` bigint,
|
||||
`trigger_time` bigint not null,
|
||||
`trigger_value` varchar(255) not null,
|
||||
`trigger_value` text not null,
|
||||
`annotations` text not null comment 'annotations',
|
||||
`rule_config` text not null comment 'annotations',
|
||||
`tags` varchar(1024) not null default '' comment 'merge data_tags rule_tags, split by ,,',
|
||||
`original_tags` text comment 'labels key=val,,k2=v2',
|
||||
PRIMARY KEY (`id`),
|
||||
KEY (`hash`),
|
||||
KEY (`rule_id`),
|
||||
@@ -480,7 +484,7 @@ CREATE TABLE `alert_his_event` (
|
||||
`prom_for_duration` int not null comment 'prometheus for, unit:s',
|
||||
`prom_ql` varchar(8192) not null comment 'promql',
|
||||
`prom_eval_interval` int not null comment 'evaluate interval',
|
||||
`callbacks` varchar(255) not null default '' comment 'split by space: http://a.com/api/x http://a.com/api/y',
|
||||
`callbacks` varchar(2048) not null default '' comment 'split by space: http://a.com/api/x http://a.com/api/y',
|
||||
`runbook_url` varchar(255),
|
||||
`notify_recovered` tinyint(1) not null comment 'whether notify when recovery',
|
||||
`notify_channels` varchar(255) not null default '' comment 'split by space: sms voice email dingtalk wecom',
|
||||
@@ -490,10 +494,11 @@ CREATE TABLE `alert_his_event` (
|
||||
`target_note` varchar(191) not null default '' comment 'target note',
|
||||
`first_trigger_time` bigint,
|
||||
`trigger_time` bigint not null,
|
||||
`trigger_value` varchar(255) not null,
|
||||
`trigger_value` text not null,
|
||||
`recover_time` bigint not null default 0,
|
||||
`last_eval_time` bigint not null default 0 comment 'for time filter',
|
||||
`tags` varchar(1024) not null default '' comment 'merge data_tags rule_tags, split by ,,',
|
||||
`original_tags` text comment 'labels key=val,,k2=v2',
|
||||
`annotations` text not null comment 'annotations',
|
||||
`rule_config` text not null comment 'annotations',
|
||||
PRIMARY KEY (`id`),
|
||||
@@ -524,6 +529,8 @@ CREATE TABLE `builtin_components` (
|
||||
|
||||
CREATE TABLE `builtin_payloads` (
|
||||
`id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT '''unique identifier''',
|
||||
`component_id` bigint(20) NOT NULL DEFAULT 0 COMMENT 'component_id',
|
||||
`uuid` bigint(20) NOT NULL COMMENT '''uuid of payload''',
|
||||
`type` varchar(191) NOT NULL COMMENT '''type of payload''',
|
||||
`component` varchar(191) NOT NULL COMMENT '''component of payload''',
|
||||
`cate` varchar(191) NOT NULL COMMENT '''category of payload''',
|
||||
@@ -538,9 +545,22 @@ CREATE TABLE `builtin_payloads` (
|
||||
KEY `idx_component` (`component`),
|
||||
KEY `idx_name` (`name`),
|
||||
KEY `idx_cate` (`cate`),
|
||||
KEY `idx_uuid` (`uuid`),
|
||||
KEY `idx_type` (`type`)
|
||||
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
|
||||
|
||||
CREATE TABLE notification_record (
|
||||
`id` BIGINT PRIMARY KEY AUTO_INCREMENT,
|
||||
`event_id` BIGINT NOT NULL,
|
||||
`sub_id` BIGINT NOT NULL,
|
||||
`channel` VARCHAR(255) NOT NULL,
|
||||
`status` TINYINT NOT NULL DEFAULT 0,
|
||||
`target` VARCHAR(1024) NOT NULL,
|
||||
`details` VARCHAR(2048),
|
||||
`created_at` BIGINT NOT NULL,
|
||||
INDEX idx_evt (event_id)
|
||||
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
|
||||
|
||||
CREATE TABLE `task_tpl`
|
||||
(
|
||||
`id` int unsigned NOT NULL AUTO_INCREMENT,
|
||||
@@ -706,6 +726,15 @@ CREATE TABLE `metric_filter` (
|
||||
KEY `idx_name` (`name`)
|
||||
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
|
||||
|
||||
CREATE TABLE `target_busi_group` (
|
||||
`id` bigint NOT NULL AUTO_INCREMENT,
|
||||
`target_ident` varchar(191) NOT NULL,
|
||||
`group_id` bigint NOT NULL,
|
||||
`update_at` bigint NOT NULL,
|
||||
PRIMARY KEY (`id`),
|
||||
UNIQUE KEY `idx_target_group` (`target_ident`,`group_id`)
|
||||
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
|
||||
|
||||
CREATE TABLE `task_meta`
|
||||
(
|
||||
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
|
||||
|
||||
@@ -56,6 +56,7 @@ CREATE TABLE `builtin_components` (
|
||||
|
||||
CREATE TABLE `builtin_payloads` (
|
||||
`id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT '''unique identifier''',
|
||||
`uuid` bigint(20) NOT NULL COMMENT '''uuid of payload''',
|
||||
`type` varchar(191) NOT NULL COMMENT '''type of payload''',
|
||||
`component` varchar(191) NOT NULL COMMENT '''component of payload''',
|
||||
`cate` varchar(191) NOT NULL COMMENT '''category of payload''',
|
||||
@@ -70,8 +71,49 @@ CREATE TABLE `builtin_payloads` (
|
||||
KEY `idx_component` (`component`),
|
||||
KEY `idx_name` (`name`),
|
||||
KEY `idx_cate` (`cate`),
|
||||
KEY `idx_uuid` (`uuid`),
|
||||
KEY `idx_type` (`type`)
|
||||
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
|
||||
|
||||
/* v7.0.0-beta.7 */
|
||||
ALTER TABLE users ADD COLUMN last_active_time BIGINT NOT NULL DEFAULT 0;
|
||||
ALTER TABLE users ADD COLUMN last_active_time BIGINT NOT NULL DEFAULT 0;
|
||||
|
||||
/* v7.0.0-beta.13 */
|
||||
ALTER TABLE recording_rule ADD COLUMN cron_pattern VARCHAR(255) DEFAULT '' COMMENT 'cron pattern';
|
||||
|
||||
/* v7.0.0-beta.14 */
|
||||
ALTER TABLE alert_cur_event ADD COLUMN original_tags TEXT COMMENT 'labels key=val,,k2=v2';
|
||||
ALTER TABLE alert_his_event ADD COLUMN original_tags TEXT COMMENT 'labels key=val,,k2=v2';
|
||||
|
||||
/* v7.1.0 */
|
||||
ALTER TABLE target ADD COLUMN os VARCHAR(31) DEFAULT '' COMMENT 'os type';
|
||||
|
||||
/* v7.2.0 */
|
||||
CREATE TABLE notification_record (
|
||||
`id` BIGINT PRIMARY KEY AUTO_INCREMENT,
|
||||
`event_id` BIGINT NOT NULL,
|
||||
`sub_id` BIGINT NOT NULL,
|
||||
`channel` VARCHAR(255) NOT NULL,
|
||||
`status` TINYINT NOT NULL DEFAULT 0,
|
||||
`target` VARCHAR(1024) NOT NULL,
|
||||
`details` VARCHAR(2048),
|
||||
`created_at` BIGINT NOT NULL,
|
||||
INDEX idx_evt (event_id)
|
||||
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
|
||||
|
||||
|
||||
/* v7.3.0 2024-08-26 */
|
||||
ALTER TABLE `target` ADD COLUMN `host_tags` TEXT COMMENT 'global labels set in conf file';
|
||||
|
||||
/* v7.3.4 2024-08-28 */
|
||||
ALTER TABLE `builtin_payloads` ADD COLUMN `component_id` bigint(20) NOT NULL DEFAULT 0 COMMENT 'component_id';
|
||||
|
||||
/* v7.4.0 2024-09-20 */
|
||||
CREATE TABLE `target_busi_group` (
|
||||
`id` bigint NOT NULL AUTO_INCREMENT,
|
||||
`target_ident` varchar(191) NOT NULL,
|
||||
`group_id` bigint NOT NULL,
|
||||
`update_at` bigint NOT NULL,
|
||||
PRIMARY KEY (`id`),
|
||||
UNIQUE KEY `idx_target_group` (`target_ident`,`group_id`)
|
||||
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
|
||||
@@ -389,7 +389,7 @@ CREATE TABLE `alert_cur_event` (
|
||||
`target_note` varchar(191) not null default '',
|
||||
`first_trigger_time` bigint,
|
||||
`trigger_time` bigint not null,
|
||||
`trigger_value` varchar(255) not null,
|
||||
`trigger_value` varchar(2048) not null,
|
||||
`annotations` text not null,
|
||||
`rule_config` text not null,
|
||||
`tags` varchar(1024) not null default ''
|
||||
@@ -427,7 +427,7 @@ CREATE TABLE `alert_his_event` (
|
||||
`target_note` varchar(191) not null default '',
|
||||
`first_trigger_time` bigint,
|
||||
`trigger_time` bigint not null,
|
||||
`trigger_value` varchar(255) not null,
|
||||
`trigger_value` varchar(2048) not null,
|
||||
`recover_time` bigint not null default 0,
|
||||
`last_eval_time` bigint not null default 0,
|
||||
`tags` varchar(1024) not null default '',
|
||||
File diff suppressed because one or more lines are too long
10
go.mod
10
go.mod
@@ -8,7 +8,7 @@ require (
|
||||
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc
|
||||
github.com/dgrijalva/jwt-go v3.2.0+incompatible
|
||||
github.com/expr-lang/expr v1.16.1
|
||||
github.com/flashcatcloud/ibex v1.3.4
|
||||
github.com/flashcatcloud/ibex v1.3.5
|
||||
github.com/gin-contrib/pprof v1.4.0
|
||||
github.com/gin-gonic/gin v1.9.1
|
||||
github.com/go-ldap/ldap/v3 v3.4.4
|
||||
@@ -18,6 +18,7 @@ require (
|
||||
github.com/golang/snappy v0.0.4
|
||||
github.com/google/uuid v1.3.0
|
||||
github.com/hashicorp/go-version v1.6.0
|
||||
github.com/jinzhu/copier v0.4.0
|
||||
github.com/json-iterator/go v1.1.12
|
||||
github.com/koding/multiconfig v0.0.0-20171124222453-69c27309b2d7
|
||||
github.com/mailru/easyjson v0.7.7
|
||||
@@ -32,7 +33,7 @@ require (
|
||||
github.com/redis/go-redis/v9 v9.0.2
|
||||
github.com/spaolacci/murmur3 v1.1.0
|
||||
github.com/tidwall/gjson v1.14.0
|
||||
github.com/toolkits/pkg v1.3.6
|
||||
github.com/toolkits/pkg v1.3.8
|
||||
golang.org/x/exp v0.0.0-20230713183714-613f0c0eb8a1
|
||||
golang.org/x/oauth2 v0.10.0
|
||||
gopkg.in/gomail.v2 v2.0.0-20160411212932-81ebce5c23df
|
||||
@@ -85,6 +86,7 @@ require (
|
||||
github.com/pquerna/cachecontrol v0.1.0 // indirect
|
||||
github.com/prometheus/client_model v0.4.0 // indirect
|
||||
github.com/prometheus/procfs v0.11.0 // indirect
|
||||
github.com/robfig/cron/v3 v3.0.1
|
||||
github.com/tidwall/match v1.1.1 // indirect
|
||||
github.com/tidwall/pretty v1.2.0 // indirect
|
||||
github.com/twitchyliquid64/golang-asm v0.15.1 // indirect
|
||||
@@ -93,10 +95,10 @@ require (
|
||||
go.uber.org/automaxprocs v1.5.2 // indirect
|
||||
golang.org/x/arch v0.3.0 // indirect
|
||||
golang.org/x/crypto v0.21.0 // indirect
|
||||
golang.org/x/image v0.13.0 // indirect
|
||||
golang.org/x/image v0.18.0 // indirect
|
||||
golang.org/x/net v0.23.0 // indirect
|
||||
golang.org/x/sys v0.18.0 // indirect
|
||||
golang.org/x/text v0.14.0 // indirect
|
||||
golang.org/x/text v0.16.0 // indirect
|
||||
google.golang.org/appengine v1.6.7 // indirect
|
||||
google.golang.org/protobuf v1.33.0 // indirect
|
||||
gopkg.in/alexcesaro/quotedprintable.v3 v3.0.0-20150716171945-2caba252f4dc // indirect
|
||||
|
||||
21
go.sum
21
go.sum
@@ -47,8 +47,8 @@ github.com/fatih/camelcase v1.0.0 h1:hxNvNX/xYBp0ovncs8WyWZrOrpBNub/JfaMvbURyft8
|
||||
github.com/fatih/camelcase v1.0.0/go.mod h1:yN2Sb0lFhZJUdVvtELVWefmrXpuZESvPmqwoZc+/fpc=
|
||||
github.com/fatih/structs v1.1.0 h1:Q7juDM0QtcnhCpeyLGQKyg4TOIghuNXrkL32pHAUMxo=
|
||||
github.com/fatih/structs v1.1.0/go.mod h1:9NiDSp5zOcgEDl+j00MP/WkGVPOlPRLejGD8Ga6PJ7M=
|
||||
github.com/flashcatcloud/ibex v1.3.4 h1:s5MgQmDIYR18liBKPNl96kC/h1jOTZjIOlUWeSx0710=
|
||||
github.com/flashcatcloud/ibex v1.3.4/go.mod h1:T8hbMUySK2q6cXUaYp0AUVeKkU9Od2LjzwmB5lmTRBM=
|
||||
github.com/flashcatcloud/ibex v1.3.5 h1:8GOOf5+aJT0TP/MC6izz7CO5JKJSdKVFBwL0vQp93Nc=
|
||||
github.com/flashcatcloud/ibex v1.3.5/go.mod h1:T8hbMUySK2q6cXUaYp0AUVeKkU9Od2LjzwmB5lmTRBM=
|
||||
github.com/gabriel-vasile/mimetype v1.4.2 h1:w5qFW6JKBz9Y393Y4q372O9A7cUSequkh1Q7OhCmWKU=
|
||||
github.com/gabriel-vasile/mimetype v1.4.2/go.mod h1:zApsH/mKG4w07erKIaJPFiX0Tsq9BFQgN3qGY5GnNgA=
|
||||
github.com/garyburd/redigo v1.6.2/go.mod h1:NR3MbYisc3/PwhQ00EMzDiPmrwpPxAn5GI05/YaO1SY=
|
||||
@@ -160,6 +160,8 @@ github.com/jackc/puddle v0.0.0-20190413234325-e4ced69a3a2b/go.mod h1:m4B5Dj62Y0f
|
||||
github.com/jackc/puddle v0.0.0-20190608224051-11cab39313c9/go.mod h1:m4B5Dj62Y0fbyuIc15OsIqK0+JU8nkqQjsgx7dvjSWk=
|
||||
github.com/jackc/puddle v1.1.3/go.mod h1:m4B5Dj62Y0fbyuIc15OsIqK0+JU8nkqQjsgx7dvjSWk=
|
||||
github.com/jackc/puddle v1.3.0/go.mod h1:m4B5Dj62Y0fbyuIc15OsIqK0+JU8nkqQjsgx7dvjSWk=
|
||||
github.com/jinzhu/copier v0.4.0 h1:w3ciUoD19shMCRargcpm0cm91ytaBhDvuRpz1ODO/U8=
|
||||
github.com/jinzhu/copier v0.4.0/go.mod h1:DfbEm0FYsaqBcKcFuvmOZb218JkPGtvSHsKg8S8hyyg=
|
||||
github.com/jinzhu/inflection v1.0.0 h1:K317FqzuhWc8YvSVlFMCCUb36O/S9MCKRDI7QkRKD/E=
|
||||
github.com/jinzhu/inflection v1.0.0/go.mod h1:h+uFLlag+Qp1Va5pdKtLDYj+kHp5pxUVkryuEj+Srlc=
|
||||
github.com/jinzhu/now v1.1.4/go.mod h1:d3SSVoowX0Lcu0IBviAWJpolVfI5UJVZZ7cO71lE/z8=
|
||||
@@ -250,6 +252,8 @@ github.com/rakyll/statik v0.1.7 h1:OF3QCZUuyPxuGEP7B4ypUa7sB/iHtqOTDYZXGM8KOdQ=
|
||||
github.com/rakyll/statik v0.1.7/go.mod h1:AlZONWzMtEnMs7W4e/1LURLiI49pIMmp6V9Unghqrcc=
|
||||
github.com/redis/go-redis/v9 v9.0.2 h1:BA426Zqe/7r56kCcvxYLWe1mkaz71LKF77GwgFzSxfE=
|
||||
github.com/redis/go-redis/v9 v9.0.2/go.mod h1:/xDTe9EF1LM61hek62Poq2nzQSGj0xSrEtEHbBQevps=
|
||||
github.com/robfig/cron/v3 v3.0.1 h1:WdRxkvbJztn8LMz/QEvLN5sBU+xKpSqwwUO1Pjr4qDs=
|
||||
github.com/robfig/cron/v3 v3.0.1/go.mod h1:eQICP3HwyT7UooqI/z+Ov+PtYAWygg1TEWWzGIFLtro=
|
||||
github.com/robfig/go-cache v0.0.0-20130306151617-9fc39e0dbf62/go.mod h1:65XQgovT59RWatovFwnwocoUxiI/eENTnOY5GK3STuY=
|
||||
github.com/rogpeppe/go-internal v1.3.0/go.mod h1:M8bDsm7K2OlrFYOpmOWEs/qY81heoFRclV5y23lUDJ4=
|
||||
github.com/rogpeppe/go-internal v1.6.1/go.mod h1:xXDCJY+GAPziupqXw64V24skbSoqbTEfhy4qGm1nDQc=
|
||||
@@ -290,8 +294,8 @@ github.com/tidwall/match v1.1.1 h1:+Ho715JplO36QYgwN9PGYNhgZvoUSc9X2c80KVTi+GA=
|
||||
github.com/tidwall/match v1.1.1/go.mod h1:eRSPERbgtNPcGhD8UCthc6PmLEQXEWd3PRB5JTxsfmM=
|
||||
github.com/tidwall/pretty v1.2.0 h1:RWIZEg2iJ8/g6fDDYzMpobmaoGh5OLl4AXtGUGPcqCs=
|
||||
github.com/tidwall/pretty v1.2.0/go.mod h1:ITEVvHYasfjBbM0u2Pg8T2nJnzm8xPwvNhhsoaGGjNU=
|
||||
github.com/toolkits/pkg v1.3.6 h1:47e1amsY6mJmcnF3Y2lIpkJXfoYY2RmgI09PtwdAEMU=
|
||||
github.com/toolkits/pkg v1.3.6/go.mod h1:M9ecwFGW1vxCTUFM9sr2ZjXSKb04N+1sTQ6SA3RNAIU=
|
||||
github.com/toolkits/pkg v1.3.8 h1:2yamC20c5mHRtbcGiLY99Lm/2mVitFn6onE8KKvMT1o=
|
||||
github.com/toolkits/pkg v1.3.8/go.mod h1:M9ecwFGW1vxCTUFM9sr2ZjXSKb04N+1sTQ6SA3RNAIU=
|
||||
github.com/twitchyliquid64/golang-asm v0.15.1 h1:SU5vSMR7hnwNxj24w34ZyCi/FmDZTkS4MhqMhdFk5YI=
|
||||
github.com/twitchyliquid64/golang-asm v0.15.1/go.mod h1:a1lVb/DtPvCB8fslRZhAngC2+aY1QWCk3Cedj/Gdt08=
|
||||
github.com/ugorji/go v1.2.7/go.mod h1:nF9osbDWLy6bDVv/Rtoh6QgnvNDpmCalQV5urGCCS6M=
|
||||
@@ -340,8 +344,9 @@ golang.org/x/crypto v0.21.0 h1:X31++rzVUdKhX5sWmSOFZxx8UW/ldWx55cbf08iNAMA=
|
||||
golang.org/x/crypto v0.21.0/go.mod h1:0BP7YvVV9gBbVKyeTG0Gyn+gZm94bibOW5BjDEYAOMs=
|
||||
golang.org/x/exp v0.0.0-20230713183714-613f0c0eb8a1 h1:MGwJjxBy0HJshjDNfLsYO8xppfqWlA5ZT9OhtUUhTNw=
|
||||
golang.org/x/exp v0.0.0-20230713183714-613f0c0eb8a1/go.mod h1:FXUEEKJgO7OQYeo8N01OfiKP8RXMtf6e8aTskBGqWdc=
|
||||
golang.org/x/image v0.13.0 h1:3cge/F/QTkNLauhf2QoE9zp+7sr+ZcL4HnoZmdwg9sg=
|
||||
golang.org/x/image v0.13.0/go.mod h1:6mmbMOeV28HuMTgA6OSRkdXKYw/t5W9Uwn2Yv1r3Yxk=
|
||||
golang.org/x/image v0.18.0 h1:jGzIakQa/ZXI1I0Fxvaa9W7yP25TqT6cHIHn+6CqvSQ=
|
||||
golang.org/x/image v0.18.0/go.mod h1:4yyo5vMFQjVjUcVk4jEQcU9MGy/rulF5WvUILseCM2E=
|
||||
golang.org/x/lint v0.0.0-20190930215403-16217165b5de/go.mod h1:6SW0HCj/g11FgYtHlgUYUwCkIfeOF89ocIRzGO/8vkc=
|
||||
golang.org/x/mod v0.0.0-20190513183733-4bf6d317e70e/go.mod h1:mXi4GBBbnImb6dmsKGUJ2LatrhH/nqhxcFungHvyanc=
|
||||
golang.org/x/mod v0.1.1-0.20191105210325-c90efee705ee/go.mod h1:QqPTAvyqsEbceGzBzNggFXnrqF1CaUcvgkdR5Ot7KZg=
|
||||
@@ -372,7 +377,7 @@ golang.org/x/sync v0.0.0-20190911185100-cd5d95a43a6e/go.mod h1:RxMgew5VJxzue5/jJ
|
||||
golang.org/x/sync v0.0.0-20201020160332-67f06af15bc9/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
|
||||
golang.org/x/sync v0.0.0-20220722155255-886fb9371eb4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
|
||||
golang.org/x/sync v0.1.0/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
|
||||
golang.org/x/sync v0.3.0 h1:ftCYgMx6zT/asHUrPw8BLLscYtGznsLAnjq5RH9P66E=
|
||||
golang.org/x/sync v0.7.0 h1:YsImfSBoP9QPYL0xyKJPq0gcaJdG3rInoqxTWbfQu9M=
|
||||
golang.org/x/sys v0.0.0-20180905080454-ebe1bf3edb33/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
|
||||
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
|
||||
golang.org/x/sys v0.0.0-20190222072716-a9d3bda3a223/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
|
||||
@@ -414,8 +419,8 @@ golang.org/x/text v0.7.0/go.mod h1:mrYo+phRRbMaCq/xk9113O4dZlRixOauAjOtrjsXDZ8=
|
||||
golang.org/x/text v0.8.0/go.mod h1:e1OnstbJyHTd6l/uOt8jFFHp6TRDWZR/bV3emEE/zU8=
|
||||
golang.org/x/text v0.9.0/go.mod h1:e1OnstbJyHTd6l/uOt8jFFHp6TRDWZR/bV3emEE/zU8=
|
||||
golang.org/x/text v0.13.0/go.mod h1:TvPlkZtksWOMsz7fbANvkp4WM8x/WCo/om8BMLbz+aE=
|
||||
golang.org/x/text v0.14.0 h1:ScX5w1eTa3QqT8oi6+ziP7dTV1S2+ALU0bI+0zXKWiQ=
|
||||
golang.org/x/text v0.14.0/go.mod h1:18ZOQIKpY8NJVqYksKHtTdi31H5itFRjB5/qKTNYzSU=
|
||||
golang.org/x/text v0.16.0 h1:a94ExnEXNtEwYLGJSIUxnWoxoRz/ZcCsV63ROupILh4=
|
||||
golang.org/x/text v0.16.0/go.mod h1:GhwF1Be+LQoKShO3cGOHzqOgRrGaYc9AvblQOmPVHnI=
|
||||
golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
|
||||
golang.org/x/tools v0.0.0-20190311212946-11955173bddd/go.mod h1:LCzVGOaR6xXOjkQ3onu1FJEFr0SW1gC7cKk1uF8kGRs=
|
||||
golang.org/x/tools v0.0.0-20190425163242-31fd60d6bfdc/go.mod h1:RgjU9mgBXZiqYHBnxXauZ1Gv1EHHAz9KjViQ78xBX0Q=
|
||||
|
||||
971
integrations/AliYun/dashboards/mysql.json
Normal file
971
integrations/AliYun/dashboards/mysql.json
Normal file
@@ -0,0 +1,971 @@
|
||||
{
|
||||
"name": "阿里云MySQL",
|
||||
"tags": "阿里云 mysql",
|
||||
"ident": "",
|
||||
"configs": {
|
||||
"panels": [
|
||||
{
|
||||
"type": "row",
|
||||
"id": "1cb8caf3-ef35-4572-9ecc-71b9f063a685",
|
||||
"name": "关键指标",
|
||||
"collapsed": true,
|
||||
"layout": {
|
||||
"h": 1,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 0,
|
||||
"i": "1cb8caf3-ef35-4572-9ecc-71b9f063a685",
|
||||
"isResizable": false
|
||||
},
|
||||
"panels": []
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"id": "5aad17df-354e-40de-a643-61da6668939b",
|
||||
"layout": {
|
||||
"h": 5,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 1,
|
||||
"i": "fcf9515d-3a56-4596-8b3a-d7d8631aa218",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "3.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${datasource}",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "AliyunRds_MySQL_SlowQueries{instanceName=\"$instance\"}",
|
||||
"legend": "{{instanceName}}",
|
||||
"refId": "A",
|
||||
"maxDataPoints": 240
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "每秒慢查询数量(countS)",
|
||||
"maxPerRow": 4,
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden",
|
||||
"behaviour": "showItem"
|
||||
},
|
||||
"standardOptions": {},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "#634CD9",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"spanNulls": false,
|
||||
"lineWidth": 2,
|
||||
"fillOpacity": 0.3,
|
||||
"gradientMode": "opacity",
|
||||
"stack": "off",
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
}
|
||||
},
|
||||
"overrides": [
|
||||
{
|
||||
"matcher": {
|
||||
"id": "byFrameRefID"
|
||||
},
|
||||
"properties": {
|
||||
"rightYAxisDisplay": "off"
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "row",
|
||||
"id": "2b3a816e-94e2-4c9d-9bb8-770c458033db",
|
||||
"name": "基础指标",
|
||||
"collapsed": true,
|
||||
"layout": {
|
||||
"h": 1,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 6,
|
||||
"i": "2b3a816e-94e2-4c9d-9bb8-770c458033db",
|
||||
"isResizable": false
|
||||
},
|
||||
"panels": []
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"id": "12d4a674-6d09-4b02-aa4f-d767531bd368",
|
||||
"layout": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 0,
|
||||
"y": 7,
|
||||
"i": "baba4778-b950-4224-9dac-9ecda041f93b",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "3.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${datasource}",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "AliyunRds_CpuUsage{instanceName=\"$instance\"}",
|
||||
"legend": "",
|
||||
"refId": "A",
|
||||
"maxDataPoints": 240
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "CPU使用率",
|
||||
"maxPerRow": 4,
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden",
|
||||
"behaviour": "showItem"
|
||||
},
|
||||
"standardOptions": {
|
||||
"util": "percent"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "#634CD9",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"spanNulls": false,
|
||||
"lineWidth": 2,
|
||||
"fillOpacity": 0.3,
|
||||
"gradientMode": "opacity",
|
||||
"stack": "off",
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
}
|
||||
},
|
||||
"overrides": [
|
||||
{
|
||||
"matcher": {
|
||||
"id": "byFrameRefID"
|
||||
},
|
||||
"properties": {
|
||||
"rightYAxisDisplay": "off"
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"id": "55b17951-a4ae-46a7-a2d7-57db1414f6ff",
|
||||
"layout": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 8,
|
||||
"y": 7,
|
||||
"i": "c4c248bd-21fb-4485-8235-f50640116e65",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "3.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${datasource}",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "AliyunRds_MemoryUsage{instanceName=\"$instance\"}",
|
||||
"legend": "",
|
||||
"refId": "A",
|
||||
"maxDataPoints": 240
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "内存使用率",
|
||||
"maxPerRow": 4,
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden",
|
||||
"behaviour": "showItem"
|
||||
},
|
||||
"standardOptions": {
|
||||
"util": "percent"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "#634CD9",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"spanNulls": false,
|
||||
"lineWidth": 2,
|
||||
"fillOpacity": 0.3,
|
||||
"gradientMode": "opacity",
|
||||
"stack": "off",
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
}
|
||||
},
|
||||
"overrides": [
|
||||
{
|
||||
"matcher": {
|
||||
"id": "byFrameRefID"
|
||||
},
|
||||
"properties": {
|
||||
"rightYAxisDisplay": "off"
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"id": "02c6af68-0e59-4f62-b0e8-80a9a9d0df82",
|
||||
"layout": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 16,
|
||||
"y": 7,
|
||||
"i": "51cf9211-5e76-4176-b1ec-42929ccc6803",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "3.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${datasource}",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "AliyunRds_DiskUsage{instanceName=\"$instance\"}",
|
||||
"legend": "",
|
||||
"refId": "A",
|
||||
"maxDataPoints": 240
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "磁盘使用率",
|
||||
"maxPerRow": 4,
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden",
|
||||
"behaviour": "showItem"
|
||||
},
|
||||
"standardOptions": {
|
||||
"util": "percent"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "#634CD9",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"spanNulls": false,
|
||||
"lineWidth": 2,
|
||||
"fillOpacity": 0.3,
|
||||
"gradientMode": "opacity",
|
||||
"stack": "off",
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
}
|
||||
},
|
||||
"overrides": [
|
||||
{
|
||||
"matcher": {
|
||||
"id": "byFrameRefID"
|
||||
},
|
||||
"properties": {
|
||||
"rightYAxisDisplay": "off"
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"id": "b72c5032-1ea0-4c87-9cfd-d21b374680f1",
|
||||
"layout": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 0,
|
||||
"y": 11,
|
||||
"i": "b72c5032-1ea0-4c87-9cfd-d21b374680f1",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "3.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${datasource}",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "AliyunRds_MySQL_ActiveSessions{instanceName=\"$instance\"}",
|
||||
"legend": "",
|
||||
"refId": "A",
|
||||
"maxDataPoints": 240
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "活跃连接数",
|
||||
"maxPerRow": 4,
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden",
|
||||
"behaviour": "showItem"
|
||||
},
|
||||
"standardOptions": {},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "#634CD9",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"spanNulls": false,
|
||||
"lineWidth": 2,
|
||||
"fillOpacity": 0.3,
|
||||
"gradientMode": "opacity",
|
||||
"stack": "off",
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
}
|
||||
},
|
||||
"overrides": [
|
||||
{
|
||||
"matcher": {
|
||||
"id": "byFrameRefID"
|
||||
},
|
||||
"properties": {
|
||||
"rightYAxisDisplay": "off"
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"id": "b518c9c4-f0e8-4712-ab67-be4521eeff0c",
|
||||
"layout": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 8,
|
||||
"y": 11,
|
||||
"i": "ff589719-6072-488d-819d-6e080a6f3c60",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "3.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${datasource}",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "AliyunRds_ConnectionUsage{instanceName=\"$instance\"}",
|
||||
"legend": "",
|
||||
"refId": "A",
|
||||
"maxDataPoints": 240
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "连接数使用率",
|
||||
"maxPerRow": 4,
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden",
|
||||
"behaviour": "showItem"
|
||||
},
|
||||
"standardOptions": {},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "#634CD9",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"spanNulls": false,
|
||||
"lineWidth": 2,
|
||||
"fillOpacity": 0.3,
|
||||
"gradientMode": "opacity",
|
||||
"stack": "off",
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
}
|
||||
},
|
||||
"overrides": [
|
||||
{
|
||||
"matcher": {
|
||||
"id": "byFrameRefID"
|
||||
},
|
||||
"properties": {
|
||||
"rightYAxisDisplay": "off"
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"id": "86c1f728-ac1e-402b-bea6-2e3979f472c3",
|
||||
"layout": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 16,
|
||||
"y": 11,
|
||||
"i": "5d673c5d-1fbb-4df4-9ece-c991d053ca34",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "3.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${datasource}",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "AliyunRds_IOPSUsage{instanceName=\"$instance\"} ",
|
||||
"legend": "",
|
||||
"refId": "A",
|
||||
"maxDataPoints": 240
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "IOPS使用率",
|
||||
"maxPerRow": 4,
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden",
|
||||
"behaviour": "showItem"
|
||||
},
|
||||
"standardOptions": {
|
||||
"util": "percent"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "#634CD9",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"spanNulls": false,
|
||||
"lineWidth": 2,
|
||||
"fillOpacity": 0.3,
|
||||
"gradientMode": "opacity",
|
||||
"stack": "off",
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
}
|
||||
},
|
||||
"overrides": [
|
||||
{
|
||||
"matcher": {
|
||||
"id": "byFrameRefID"
|
||||
},
|
||||
"properties": {
|
||||
"rightYAxisDisplay": "off",
|
||||
"standardOptions": {
|
||||
"util": "percent"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"id": "dc874418-8d11-409c-96e8-e48fac2f6e20",
|
||||
"layout": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 0,
|
||||
"y": 15,
|
||||
"i": "86915dd4-990c-41ba-b048-3da301d97327",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "3.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${datasource}",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "AliyunRds_MySQL_NetworkInNew{instanceName=\"$instance\"}/ 8",
|
||||
"legend": "",
|
||||
"refId": "A",
|
||||
"maxDataPoints": 240
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "网络流入带宽",
|
||||
"maxPerRow": 4,
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden",
|
||||
"behaviour": "showItem"
|
||||
},
|
||||
"standardOptions": {
|
||||
"util": "bytesSecIEC"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "#634CD9",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"spanNulls": false,
|
||||
"lineWidth": 2,
|
||||
"fillOpacity": 0.3,
|
||||
"gradientMode": "opacity",
|
||||
"stack": "off",
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
}
|
||||
},
|
||||
"overrides": [
|
||||
{
|
||||
"matcher": {
|
||||
"id": "byFrameRefID"
|
||||
},
|
||||
"properties": {
|
||||
"rightYAxisDisplay": "off"
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"id": "b979878a-81a6-4c0d-960d-22a736d00655",
|
||||
"layout": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 8,
|
||||
"y": 15,
|
||||
"i": "86f9e07f-85dc-44e0-8245-ca0a9b0dfa81",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "3.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${datasource}",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "AliyunRds_MySQL_NetworkOutNew{instanceName=\"$instance\"}/ 8",
|
||||
"legend": "",
|
||||
"refId": "A",
|
||||
"maxDataPoints": 240
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "网络流出带宽",
|
||||
"maxPerRow": 4,
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden",
|
||||
"behaviour": "showItem"
|
||||
},
|
||||
"standardOptions": {
|
||||
"util": "bytesSecIEC"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "#634CD9",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"spanNulls": false,
|
||||
"lineWidth": 2,
|
||||
"fillOpacity": 0.3,
|
||||
"gradientMode": "opacity",
|
||||
"stack": "off",
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
}
|
||||
},
|
||||
"overrides": [
|
||||
{
|
||||
"matcher": {
|
||||
"id": "byFrameRefID"
|
||||
},
|
||||
"properties": {
|
||||
"rightYAxisDisplay": "off"
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "row",
|
||||
"id": "6d896a20-bf04-4dc7-94da-1394ef109848",
|
||||
"name": "性能指标",
|
||||
"collapsed": true,
|
||||
"layout": {
|
||||
"h": 1,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 19,
|
||||
"i": "6d896a20-bf04-4dc7-94da-1394ef109848",
|
||||
"isResizable": false
|
||||
},
|
||||
"panels": []
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"id": "2e545b2b-130b-4829-a2d2-ee5305c302aa",
|
||||
"layout": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 0,
|
||||
"y": 20,
|
||||
"i": "13dceb72-9e9d-483d-86d2-b192debdcece",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "3.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${datasource}",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "AliyunRds_MySQL_QPS{instanceName=\"$instance\"}",
|
||||
"legend": "",
|
||||
"refId": "A",
|
||||
"maxDataPoints": 240
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "QPS",
|
||||
"maxPerRow": 4,
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden",
|
||||
"behaviour": "showItem"
|
||||
},
|
||||
"standardOptions": {
|
||||
"util": "reqps"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "#634CD9",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"spanNulls": false,
|
||||
"lineWidth": 2,
|
||||
"fillOpacity": 0.3,
|
||||
"gradientMode": "opacity",
|
||||
"stack": "off",
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
}
|
||||
},
|
||||
"overrides": [
|
||||
{
|
||||
"matcher": {
|
||||
"id": "byFrameRefID"
|
||||
},
|
||||
"properties": {
|
||||
"rightYAxisDisplay": "off"
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"id": "0299da4b-d779-4ed7-9cd5-096f43181b2e",
|
||||
"layout": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 8,
|
||||
"y": 20,
|
||||
"i": "2b23c24e-b6f9-44f5-8151-2d5a7585c31a",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "3.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${datasource}",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "AliyunRds_MySQL_TPS{instanceName=\"$instance\"}",
|
||||
"legend": "",
|
||||
"refId": "A",
|
||||
"maxDataPoints": 240
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "TPS",
|
||||
"maxPerRow": 4,
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden",
|
||||
"behaviour": "showItem"
|
||||
},
|
||||
"standardOptions": {
|
||||
"util": "reqps"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "#634CD9",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"spanNulls": false,
|
||||
"lineWidth": 2,
|
||||
"fillOpacity": 0.3,
|
||||
"gradientMode": "opacity",
|
||||
"stack": "off",
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
}
|
||||
},
|
||||
"overrides": [
|
||||
{
|
||||
"matcher": {
|
||||
"id": "byFrameRefID"
|
||||
},
|
||||
"properties": {
|
||||
"rightYAxisDisplay": "off"
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"id": "56a0e345-1d4d-4051-a3cf-738bea220f96",
|
||||
"layout": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 16,
|
||||
"y": 20,
|
||||
"i": "d1752ed4-f4a1-4c4b-854f-1c2ef01b34a4",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "3.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${datasource}",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "AliyunRds_MySQL_IbufUseRatio{instanceName=\"$instance\"}",
|
||||
"legend": "",
|
||||
"refId": "A",
|
||||
"maxDataPoints": 240
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "BP利用率",
|
||||
"maxPerRow": 4,
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden",
|
||||
"behaviour": "showItem"
|
||||
},
|
||||
"standardOptions": {},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "#634CD9",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"spanNulls": false,
|
||||
"lineWidth": 2,
|
||||
"fillOpacity": 0.3,
|
||||
"gradientMode": "opacity",
|
||||
"stack": "off",
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
}
|
||||
},
|
||||
"overrides": [
|
||||
{
|
||||
"matcher": {
|
||||
"id": "byFrameRefID"
|
||||
},
|
||||
"properties": {
|
||||
"rightYAxisDisplay": "off"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"var": [
|
||||
{
|
||||
"name": "datasource",
|
||||
"label": "datasource",
|
||||
"type": "datasource",
|
||||
"hide": false,
|
||||
"definition": "prometheus"
|
||||
},
|
||||
{
|
||||
"name": "instance",
|
||||
"label": "",
|
||||
"type": "query",
|
||||
"hide": false,
|
||||
"datasource": {
|
||||
"cate": "prometheus",
|
||||
"value": "${datasource}"
|
||||
},
|
||||
"definition": "label_values(AliyunRds_MySQL_SlowQueries, instanceName)"
|
||||
}
|
||||
],
|
||||
"version": "3.0.0"
|
||||
},
|
||||
"uuid": 1717556327098444000
|
||||
}
|
||||
|
Before Width: | Height: | Size: 2.8 KiB After Width: | Height: | Size: 2.8 KiB |
@@ -584,7 +584,7 @@
|
||||
"links": [
|
||||
{
|
||||
"title": "下钻",
|
||||
"url": "/dashboards/automq-group-metrics?TSDB=${DS_PROMETHEUS}\u0026cluster_id=${cluster_id}\u0026group_id=${__field.labels.consumer_group}\u0026partition=all\u0026topic=${__field.labels.topic}"
|
||||
"url": "/built-in-components/dashboard/detail?__uuid__=1717556327172992000&TSDB=${DS_PROMETHEUS}\u0026cluster_id=${cluster_id}\u0026group_id=${__field.labels.consumer_group}\u0026partition=all\u0026topic=${__field.labels.topic}"
|
||||
}
|
||||
],
|
||||
"showHeader": true
|
||||
@@ -669,7 +669,7 @@
|
||||
"links": [
|
||||
{
|
||||
"title": "下钻",
|
||||
"url": "/dashboards/automq-topic-metrics?TSDB=${DS_PROMETHEUS}\u0026cluster_id=${cluster_id}\u0026topic=${__field.labels.topic}"
|
||||
"url": "/built-in-components/dashboard/detail?__uuid__=1717556327174664000&TSDB=${DS_PROMETHEUS}\u0026cluster_id=${cluster_id}\u0026topic=${__field.labels.topic}"
|
||||
}
|
||||
],
|
||||
"showHeader": true
|
||||
@@ -781,7 +781,7 @@
|
||||
"links": [
|
||||
{
|
||||
"title": "下钻",
|
||||
"url": "/dashboards/automq-broker-metrics?DS_PROMETHEUS=${DS_PROMETHEUS}\u0026cluster_id=${cluster_id}\u0026node_id=${__field.labels.instance}"
|
||||
"url": "/built-in-components/dashboard/detail?__uuid__=1717556327159415000&DS_PROMETHEUS=${DS_PROMETHEUS}\u0026cluster_id=${cluster_id}\u0026node_id=${__field.labels.instance}"
|
||||
}
|
||||
],
|
||||
"showHeader": true
|
||||
|
||||
361
integrations/ClickHouse/alerts/clickhouse_by_categraf.json
Normal file
361
integrations/ClickHouse/alerts/clickhouse_by_categraf.json
Normal file
@@ -0,0 +1,361 @@
|
||||
[
|
||||
{
|
||||
"id": 0,
|
||||
"group_id": 0,
|
||||
"cate": "prometheus",
|
||||
"datasource_ids": [
|
||||
0
|
||||
],
|
||||
"cluster": "",
|
||||
"name": "ClickHouse Categraf ZooKeeper故障",
|
||||
"note": "",
|
||||
"prod": "metric",
|
||||
"algorithm": "",
|
||||
"algo_params": null,
|
||||
"delay": 0,
|
||||
"severity": 0,
|
||||
"severities": [
|
||||
2
|
||||
],
|
||||
"disabled": 1,
|
||||
"prom_for_duration": 60,
|
||||
"prom_ql": "",
|
||||
"rule_config": {
|
||||
"queries": [
|
||||
{
|
||||
"keys": {
|
||||
"labelKey": "",
|
||||
"valueKey": ""
|
||||
},
|
||||
"prom_ql": "avg(clickhouse_metrics_zoo_keeper_session ) != 1",
|
||||
"severity": 2
|
||||
}
|
||||
]
|
||||
},
|
||||
"prom_eval_interval": 30,
|
||||
"enable_stime": "00:00",
|
||||
"enable_stimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_etime": "00:00",
|
||||
"enable_etimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_days_of_week": [
|
||||
"0",
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6"
|
||||
],
|
||||
"enable_days_of_weeks": [
|
||||
[
|
||||
"0",
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6"
|
||||
]
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_groups_obj": null,
|
||||
"notify_groups": null,
|
||||
"notify_repeat_step": 60,
|
||||
"notify_max_number": 0,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": [],
|
||||
"annotations": {},
|
||||
"extra_config": null,
|
||||
"create_at": 0,
|
||||
"create_by": "",
|
||||
"update_at": 0,
|
||||
"update_by": "",
|
||||
"uuid": 1719305153856411000
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"group_id": 0,
|
||||
"cate": "prometheus",
|
||||
"datasource_ids": [
|
||||
0
|
||||
],
|
||||
"cluster": "",
|
||||
"name": "ClickHouse Categraf 内存使用",
|
||||
"note": "内存使用报警",
|
||||
"prod": "metric",
|
||||
"algorithm": "",
|
||||
"algo_params": null,
|
||||
"delay": 0,
|
||||
"severity": 0,
|
||||
"severities": [
|
||||
1,
|
||||
2
|
||||
],
|
||||
"disabled": 1,
|
||||
"prom_for_duration": 60,
|
||||
"prom_ql": "",
|
||||
"rule_config": {
|
||||
"queries": [
|
||||
{
|
||||
"keys": {
|
||||
"labelKey": "",
|
||||
"valueKey": ""
|
||||
},
|
||||
"prom_ql": "clickhouse_metrics_memory_tracking / clickhouse_asynchronous_metrics_os_memory_total * 100 \u003e 90",
|
||||
"severity": 1
|
||||
},
|
||||
{
|
||||
"keys": {
|
||||
"labelKey": "",
|
||||
"valueKey": ""
|
||||
},
|
||||
"prom_ql": "clickhouse_metrics_memory_tracking/ clickhouse_asynchronous_metrics_os_memory_total * 100 \u003e 80",
|
||||
"severity": 2
|
||||
}
|
||||
]
|
||||
},
|
||||
"prom_eval_interval": 30,
|
||||
"enable_stime": "00:00",
|
||||
"enable_stimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_etime": "00:00",
|
||||
"enable_etimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_days_of_week": [
|
||||
"0",
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6"
|
||||
],
|
||||
"enable_days_of_weeks": [
|
||||
[
|
||||
"0",
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6"
|
||||
]
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_groups_obj": null,
|
||||
"notify_groups": null,
|
||||
"notify_repeat_step": 60,
|
||||
"notify_max_number": 0,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": [],
|
||||
"annotations": {},
|
||||
"extra_config": null,
|
||||
"create_at": 0,
|
||||
"create_by": "",
|
||||
"update_at": 0,
|
||||
"update_by": "",
|
||||
"uuid": 1719305153858877000
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"group_id": 0,
|
||||
"cate": "prometheus",
|
||||
"datasource_ids": [
|
||||
0
|
||||
],
|
||||
"cluster": "",
|
||||
"name": "ClickHouse Categraf 磁盘使用",
|
||||
"note": "磁盘使用报警",
|
||||
"prod": "metric",
|
||||
"algorithm": "",
|
||||
"algo_params": null,
|
||||
"delay": 0,
|
||||
"severity": 0,
|
||||
"severities": [
|
||||
1,
|
||||
2
|
||||
],
|
||||
"disabled": 1,
|
||||
"prom_for_duration": 60,
|
||||
"prom_ql": "",
|
||||
"rule_config": {
|
||||
"queries": [
|
||||
{
|
||||
"keys": {
|
||||
"labelKey": "",
|
||||
"valueKey": ""
|
||||
},
|
||||
"prom_ql": "clickhouse_asynchronous_metrics_disk_available_default / (clickhouse_asynchronous_metrics_disk_available_default + clickhouse_asynchronous_metrics_disk_used_default) * 100 \u003c 10",
|
||||
"severity": 1
|
||||
},
|
||||
{
|
||||
"keys": {
|
||||
"labelKey": "",
|
||||
"valueKey": ""
|
||||
},
|
||||
"prom_ql": "clickhouse_asynchronous_metrics_disk_available_default / (clickhouse_asynchronous_metrics_disk_available_default + clickhouse_asynchronous_metrics_disk_used_default) * 100 \u003c 20",
|
||||
"severity": 2
|
||||
}
|
||||
]
|
||||
},
|
||||
"prom_eval_interval": 30,
|
||||
"enable_stime": "00:00",
|
||||
"enable_stimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_etime": "00:00",
|
||||
"enable_etimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_days_of_week": [
|
||||
"0",
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6"
|
||||
],
|
||||
"enable_days_of_weeks": [
|
||||
[
|
||||
"0",
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6"
|
||||
]
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_groups_obj": null,
|
||||
"notify_groups": null,
|
||||
"notify_repeat_step": 60,
|
||||
"notify_max_number": 0,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": [],
|
||||
"annotations": {},
|
||||
"extra_config": null,
|
||||
"create_at": 0,
|
||||
"create_by": "",
|
||||
"update_at": 0,
|
||||
"update_by": "",
|
||||
"uuid": 1719305153860224000
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"group_id": 0,
|
||||
"cate": "prometheus",
|
||||
"datasource_ids": [
|
||||
0
|
||||
],
|
||||
"cluster": "",
|
||||
"name": "ClickHouse Categraf 网络故障",
|
||||
"note": "",
|
||||
"prod": "metric",
|
||||
"algorithm": "",
|
||||
"algo_params": null,
|
||||
"delay": 0,
|
||||
"severity": 0,
|
||||
"severities": [
|
||||
3,
|
||||
2
|
||||
],
|
||||
"disabled": 1,
|
||||
"prom_for_duration": 60,
|
||||
"prom_ql": "",
|
||||
"rule_config": {
|
||||
"queries": [
|
||||
{
|
||||
"keys": {
|
||||
"labelKey": "",
|
||||
"valueKey": ""
|
||||
},
|
||||
"prom_ql": "clickhouse_metrics_network_send \u003e 250 or clickhouse_metrics_network_receive \u003e 250",
|
||||
"severity": 2
|
||||
},
|
||||
{
|
||||
"keys": {
|
||||
"labelKey": "",
|
||||
"valueKey": ""
|
||||
},
|
||||
"prom_ql": "clickhouse_metrics_network_send \u003e 250 or clickhouse_metrics_network_receive \u003e 250",
|
||||
"severity": 3
|
||||
},
|
||||
{
|
||||
"keys": {
|
||||
"labelKey": "",
|
||||
"valueKey": ""
|
||||
},
|
||||
"prom_ql": "increase(clickhouse_metrics_interserver_connection[5m]) \u003e 0",
|
||||
"severity": 3
|
||||
}
|
||||
]
|
||||
},
|
||||
"prom_eval_interval": 30,
|
||||
"enable_stime": "00:00",
|
||||
"enable_stimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_etime": "00:00",
|
||||
"enable_etimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_days_of_week": [
|
||||
"0",
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6"
|
||||
],
|
||||
"enable_days_of_weeks": [
|
||||
[
|
||||
"0",
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6"
|
||||
]
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_groups_obj": null,
|
||||
"notify_groups": null,
|
||||
"notify_repeat_step": 60,
|
||||
"notify_max_number": 0,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": [],
|
||||
"annotations": {},
|
||||
"extra_config": null,
|
||||
"create_at": 0,
|
||||
"create_by": "",
|
||||
"update_at": 0,
|
||||
"update_by": "",
|
||||
"uuid": 1719305153861525000
|
||||
}
|
||||
]
|
||||
521
integrations/ClickHouse/alerts/clickhouse_by_exporter.json
Normal file
521
integrations/ClickHouse/alerts/clickhouse_by_exporter.json
Normal file
@@ -0,0 +1,521 @@
|
||||
[
|
||||
{
|
||||
"id": 0,
|
||||
"group_id": 0,
|
||||
"cate": "prometheus",
|
||||
"datasource_ids": [
|
||||
0
|
||||
],
|
||||
"cluster": "",
|
||||
"name": "ClickHouse Exporter 认证错误",
|
||||
"note": "",
|
||||
"prod": "metric",
|
||||
"algorithm": "",
|
||||
"algo_params": null,
|
||||
"delay": 0,
|
||||
"severity": 0,
|
||||
"severities": [
|
||||
2,
|
||||
3
|
||||
],
|
||||
"disabled": 1,
|
||||
"prom_for_duration": 60,
|
||||
"prom_ql": "",
|
||||
"rule_config": {
|
||||
"queries": [
|
||||
{
|
||||
"keys": {
|
||||
"labelKey": "",
|
||||
"valueKey": ""
|
||||
},
|
||||
"prom_ql": "increase(ClickHouseErrorMetric_AUTHENTICATION_FAILED[5m]) \u003e 0",
|
||||
"severity": 2
|
||||
},
|
||||
{
|
||||
"prom_ql": "increase(ClickHouseErrorMetric_RESOURCE_ACCESS_DENIED[5m]) \u003e 0",
|
||||
"severity": 3
|
||||
}
|
||||
]
|
||||
},
|
||||
"prom_eval_interval": 30,
|
||||
"enable_stime": "00:00",
|
||||
"enable_stimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_etime": "00:00",
|
||||
"enable_etimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_days_of_week": [
|
||||
"0",
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6"
|
||||
],
|
||||
"enable_days_of_weeks": [
|
||||
[
|
||||
"0",
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6"
|
||||
]
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_groups_obj": null,
|
||||
"notify_groups": null,
|
||||
"notify_repeat_step": 60,
|
||||
"notify_max_number": 0,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": [],
|
||||
"annotations": {},
|
||||
"extra_config": null,
|
||||
"create_at": 0,
|
||||
"create_by": "",
|
||||
"update_at": 0,
|
||||
"update_by": "",
|
||||
"uuid": 1719305153863782000
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"group_id": 0,
|
||||
"cate": "prometheus",
|
||||
"datasource_ids": [
|
||||
0
|
||||
],
|
||||
"cluster": "",
|
||||
"name": "ClickHouse Exporter ZooKeeper故障",
|
||||
"note": "",
|
||||
"prod": "metric",
|
||||
"algorithm": "",
|
||||
"algo_params": null,
|
||||
"delay": 0,
|
||||
"severity": 0,
|
||||
"severities": [
|
||||
2
|
||||
],
|
||||
"disabled": 1,
|
||||
"prom_for_duration": 60,
|
||||
"prom_ql": "",
|
||||
"rule_config": {
|
||||
"queries": [
|
||||
{
|
||||
"prom_ql": "avg(ClickHouseMetrics_ZooKeeperSession) != 1",
|
||||
"severity": 2
|
||||
}
|
||||
]
|
||||
},
|
||||
"prom_eval_interval": 30,
|
||||
"enable_stime": "00:00",
|
||||
"enable_stimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_etime": "00:00",
|
||||
"enable_etimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_days_of_week": [
|
||||
"0",
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6"
|
||||
],
|
||||
"enable_days_of_weeks": [
|
||||
[
|
||||
"0",
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6"
|
||||
]
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_groups_obj": null,
|
||||
"notify_groups": null,
|
||||
"notify_repeat_step": 60,
|
||||
"notify_max_number": 0,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": [],
|
||||
"annotations": {},
|
||||
"extra_config": null,
|
||||
"create_at": 0,
|
||||
"create_by": "",
|
||||
"update_at": 0,
|
||||
"update_by": "",
|
||||
"uuid": 1719305153865298000
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"group_id": 0,
|
||||
"cate": "prometheus",
|
||||
"datasource_ids": [
|
||||
0
|
||||
],
|
||||
"cluster": "",
|
||||
"name": "ClickHouse Exporter 内存使用",
|
||||
"note": "内存使用报警",
|
||||
"prod": "metric",
|
||||
"algorithm": "",
|
||||
"algo_params": null,
|
||||
"delay": 0,
|
||||
"severity": 0,
|
||||
"severities": [
|
||||
1,
|
||||
2
|
||||
],
|
||||
"disabled": 1,
|
||||
"prom_for_duration": 60,
|
||||
"prom_ql": "",
|
||||
"rule_config": {
|
||||
"queries": [
|
||||
{
|
||||
"keys": {
|
||||
"labelKey": "",
|
||||
"valueKey": ""
|
||||
},
|
||||
"prom_ql": "ClickHouseMetrics_MemoryTracking / ClickHouseAsyncMetrics_OSMemoryTotal * 100 \u003e 90",
|
||||
"severity": 1
|
||||
},
|
||||
{
|
||||
"keys": {
|
||||
"labelKey": "",
|
||||
"valueKey": ""
|
||||
},
|
||||
"prom_ql": "ClickHouseMetrics_MemoryTracking / ClickHouseAsyncMetrics_OSMemoryTotal * 100 \u003e 80",
|
||||
"severity": 2
|
||||
}
|
||||
]
|
||||
},
|
||||
"prom_eval_interval": 30,
|
||||
"enable_stime": "00:00",
|
||||
"enable_stimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_etime": "00:00",
|
||||
"enable_etimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_days_of_week": [
|
||||
"0",
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6"
|
||||
],
|
||||
"enable_days_of_weeks": [
|
||||
[
|
||||
"0",
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6"
|
||||
]
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_groups_obj": null,
|
||||
"notify_groups": null,
|
||||
"notify_repeat_step": 60,
|
||||
"notify_max_number": 0,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": [],
|
||||
"annotations": {},
|
||||
"extra_config": null,
|
||||
"create_at": 0,
|
||||
"create_by": "",
|
||||
"update_at": 0,
|
||||
"update_by": "",
|
||||
"uuid": 1719305153866296000
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"group_id": 0,
|
||||
"cate": "prometheus",
|
||||
"datasource_ids": [
|
||||
0
|
||||
],
|
||||
"cluster": "",
|
||||
"name": "ClickHouse Exporter 副本错误",
|
||||
"note": "",
|
||||
"prod": "metric",
|
||||
"algorithm": "",
|
||||
"algo_params": null,
|
||||
"delay": 0,
|
||||
"severity": 0,
|
||||
"severities": [
|
||||
1,
|
||||
3
|
||||
],
|
||||
"disabled": 1,
|
||||
"prom_for_duration": 60,
|
||||
"prom_ql": "",
|
||||
"rule_config": {
|
||||
"queries": [
|
||||
{
|
||||
"prom_ql": "ClickHouseErrorMetric_ALL_REPLICAS_ARE_STALE == 1 or ClickHouseErrorMetric_ALL_REPLICAS_LOST == 1",
|
||||
"severity": 1
|
||||
},
|
||||
{
|
||||
"prom_ql": " ClickHouseErrorMetric_NO_AVAILABLE_REPLICA == 1",
|
||||
"severity": 1
|
||||
},
|
||||
{
|
||||
"prom_ql": " ClickHouseErrorMetric_TOO_FEW_LIVE_REPLICAS == 1",
|
||||
"severity": 3
|
||||
}
|
||||
]
|
||||
},
|
||||
"prom_eval_interval": 30,
|
||||
"enable_stime": "00:00",
|
||||
"enable_stimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_etime": "00:00",
|
||||
"enable_etimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_days_of_week": [
|
||||
"0",
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6"
|
||||
],
|
||||
"enable_days_of_weeks": [
|
||||
[
|
||||
"0",
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6"
|
||||
]
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_groups_obj": null,
|
||||
"notify_groups": null,
|
||||
"notify_repeat_step": 60,
|
||||
"notify_max_number": 0,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": [],
|
||||
"annotations": {},
|
||||
"extra_config": null,
|
||||
"create_at": 0,
|
||||
"create_by": "",
|
||||
"update_at": 0,
|
||||
"update_by": "",
|
||||
"uuid": 1719305153867268000
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"group_id": 0,
|
||||
"cate": "prometheus",
|
||||
"datasource_ids": [
|
||||
0
|
||||
],
|
||||
"cluster": "",
|
||||
"name": "ClickHouse Exporter 磁盘使用",
|
||||
"note": "磁盘使用报警",
|
||||
"prod": "metric",
|
||||
"algorithm": "",
|
||||
"algo_params": null,
|
||||
"delay": 0,
|
||||
"severity": 0,
|
||||
"severities": [
|
||||
1,
|
||||
2
|
||||
],
|
||||
"disabled": 1,
|
||||
"prom_for_duration": 60,
|
||||
"prom_ql": "",
|
||||
"rule_config": {
|
||||
"queries": [
|
||||
{
|
||||
"keys": {
|
||||
"labelKey": "",
|
||||
"valueKey": ""
|
||||
},
|
||||
"prom_ql": "ClickHouseAsyncMetrics_DiskAvailable_default / (ClickHouseAsyncMetrics_DiskAvailable_default + ClickHouseAsyncMetrics_DiskUsed_default) * 100 \u003c 10",
|
||||
"severity": 1
|
||||
},
|
||||
{
|
||||
"keys": {
|
||||
"labelKey": "",
|
||||
"valueKey": ""
|
||||
},
|
||||
"prom_ql": "ClickHouseAsyncMetrics_DiskAvailable_default / (ClickHouseAsyncMetrics_DiskAvailable_default + ClickHouseAsyncMetrics_DiskUsed_default) * 100 \u003c 20",
|
||||
"severity": 2
|
||||
},
|
||||
{
|
||||
"prom_ql": "ClickHouseAsyncMetrics_DiskAvailable_backups / (ClickHouseAsyncMetrics_DiskAvailable_backups + ClickHouseAsyncMetrics_DiskUsed_backups) * 100 \u003c 20",
|
||||
"severity": 2
|
||||
}
|
||||
]
|
||||
},
|
||||
"prom_eval_interval": 30,
|
||||
"enable_stime": "00:00",
|
||||
"enable_stimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_etime": "00:00",
|
||||
"enable_etimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_days_of_week": [
|
||||
"0",
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6"
|
||||
],
|
||||
"enable_days_of_weeks": [
|
||||
[
|
||||
"0",
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6"
|
||||
]
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_groups_obj": null,
|
||||
"notify_groups": null,
|
||||
"notify_repeat_step": 60,
|
||||
"notify_max_number": 0,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": [],
|
||||
"annotations": {},
|
||||
"extra_config": null,
|
||||
"create_at": 0,
|
||||
"create_by": "",
|
||||
"update_at": 0,
|
||||
"update_by": "",
|
||||
"uuid": 1719305153868363000
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"group_id": 0,
|
||||
"cate": "prometheus",
|
||||
"datasource_ids": [
|
||||
0
|
||||
],
|
||||
"cluster": "",
|
||||
"name": "ClickHouse Exporter 网络故障",
|
||||
"note": "",
|
||||
"prod": "metric",
|
||||
"algorithm": "",
|
||||
"algo_params": null,
|
||||
"delay": 0,
|
||||
"severity": 0,
|
||||
"severities": [
|
||||
2,
|
||||
3
|
||||
],
|
||||
"disabled": 1,
|
||||
"prom_for_duration": 60,
|
||||
"prom_ql": "",
|
||||
"rule_config": {
|
||||
"queries": [
|
||||
{
|
||||
"prom_ql": "ClickHouseMetrics_NetworkSend \u003e 250 or ClickHouseMetrics_NetworkReceive \u003e 250",
|
||||
"severity": 2
|
||||
},
|
||||
{
|
||||
"prom_ql": "ClickHouseMetrics_NetworkSend \u003e 250 or ClickHouseMetrics_NetworkReceive \u003e 250",
|
||||
"severity": 3
|
||||
},
|
||||
{
|
||||
"prom_ql": "increase(ClickHouseMetrics_InterserverConnection[5m]) \u003e 0",
|
||||
"severity": 3
|
||||
}
|
||||
]
|
||||
},
|
||||
"prom_eval_interval": 30,
|
||||
"enable_stime": "00:00",
|
||||
"enable_stimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_etime": "00:00",
|
||||
"enable_etimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_days_of_week": [
|
||||
"0",
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6"
|
||||
],
|
||||
"enable_days_of_weeks": [
|
||||
[
|
||||
"0",
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6"
|
||||
]
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_groups_obj": null,
|
||||
"notify_groups": null,
|
||||
"notify_repeat_step": 60,
|
||||
"notify_max_number": 0,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": [],
|
||||
"annotations": {},
|
||||
"extra_config": null,
|
||||
"create_at": 0,
|
||||
"create_by": "",
|
||||
"update_at": 0,
|
||||
"update_by": "",
|
||||
"uuid": 1719305153869486000
|
||||
}
|
||||
]
|
||||
3878
integrations/ClickHouse/dashboards/clickhouse_by_categraf.json
Normal file
3878
integrations/ClickHouse/dashboards/clickhouse_by_categraf.json
Normal file
File diff suppressed because it is too large
Load Diff
6098
integrations/ClickHouse/dashboards/clickhouse_by_exporter.json
Normal file
6098
integrations/ClickHouse/dashboards/clickhouse_by_exporter.json
Normal file
File diff suppressed because it is too large
Load Diff
422
integrations/ClickHouse/metrics/clickhouse_by_categraf.json
Normal file
422
integrations/ClickHouse/metrics/clickhouse_by_categraf.json
Normal file
@@ -0,0 +1,422 @@
|
||||
[
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153888541000,
|
||||
"collector": "Categraf",
|
||||
"typ": "ClickHouse",
|
||||
"name": "ClickHouse HTTP 连接数",
|
||||
"unit": "sishort",
|
||||
"note": "通过HTTP协议连接到ClickHouse服务器的客户端数量。",
|
||||
"lang": "zh_CN",
|
||||
"expression": "clickhouse_metrics_http_connection",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153889950000,
|
||||
"collector": "Categraf",
|
||||
"typ": "ClickHouse",
|
||||
"name": "ClickHouse INSERT查询平均时间",
|
||||
"unit": "sishort",
|
||||
"note": "插入查询执行的平均时间(微秒)。",
|
||||
"lang": "zh_CN",
|
||||
"expression": "clickhouse_events_insert_query_time_microseconds_microseconds",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153890963000,
|
||||
"collector": "Categraf",
|
||||
"typ": "ClickHouse",
|
||||
"name": "ClickHouse SELECT 查询数",
|
||||
"unit": "none",
|
||||
"note": "执行的选择(SELECT)查询的数量",
|
||||
"lang": "zh_CN",
|
||||
"expression": "clickhouse_events_select_query",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153892134000,
|
||||
"collector": "Categraf",
|
||||
"typ": "ClickHouse",
|
||||
"name": "ClickHouse SELECT查询平均时间",
|
||||
"unit": "sishort",
|
||||
"note": "选择查询执行的平均时间(微秒)。",
|
||||
"lang": "zh_CN",
|
||||
"expression": "clickhouse_events_select_query_time_microseconds_microseconds",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153893317000,
|
||||
"collector": "Categraf",
|
||||
"typ": "ClickHouse",
|
||||
"name": "ClickHouse TCP 连接数",
|
||||
"unit": "sishort",
|
||||
"note": "通过TCP协议连接到ClickHouse服务器的客户端数量。",
|
||||
"lang": "zh_CN",
|
||||
"expression": "clickhouse_metrics_tcp_connection",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153894646000,
|
||||
"collector": "Categraf",
|
||||
"typ": "ClickHouse",
|
||||
"name": "ClickHouse 临时数据量",
|
||||
"unit": "sishort",
|
||||
"note": "临时数据部分的数量,这些部分当前正在生成。",
|
||||
"lang": "zh_CN",
|
||||
"expression": "clickhouse_metrics_parts_temporary",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153896151000,
|
||||
"collector": "Categraf",
|
||||
"typ": "ClickHouse",
|
||||
"name": "ClickHouse 分布式表连接数",
|
||||
"unit": "sishort",
|
||||
"note": "发送到分布式表的远程服务器的数据连接数。",
|
||||
"lang": "zh_CN",
|
||||
"expression": "clickhouse_metrics_distributed_send",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153897491000,
|
||||
"collector": "Categraf",
|
||||
"typ": "ClickHouse",
|
||||
"name": "ClickHouse 宽数据量",
|
||||
"unit": "sishort",
|
||||
"note": "宽数据部分的数量。",
|
||||
"lang": "zh_CN",
|
||||
"expression": "clickhouse_metrics_parts_wide",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153899026000,
|
||||
"collector": "Categraf",
|
||||
"typ": "ClickHouse",
|
||||
"name": "ClickHouse 待插入分布式表文件数",
|
||||
"unit": "sishort",
|
||||
"note": "等待异步插入到分布式表的文件数量。",
|
||||
"lang": "zh_CN",
|
||||
"expression": "clickhouse_metrics_distributed_files_to_insert",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153900278000,
|
||||
"collector": "Categraf",
|
||||
"typ": "ClickHouse",
|
||||
"name": "ClickHouse 提交前数据量",
|
||||
"unit": "sishort",
|
||||
"note": "提交前的数据部分数量,这些部分在data_parts列表中,但不用于SELECT查询。",
|
||||
"lang": "zh_CN",
|
||||
"expression": "clickhouse_metrics_parts_pre_committed",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153901527000,
|
||||
"collector": "Categraf",
|
||||
"typ": "ClickHouse",
|
||||
"name": "ClickHouse 提交后数据量",
|
||||
"unit": "sishort",
|
||||
"note": "提交后的数据部分数量,这些部分在data_parts列表中,并且用于SELECT查询。",
|
||||
"lang": "zh_CN",
|
||||
"expression": "clickhouse_metrics_parts_committed",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153902727000,
|
||||
"collector": "Categraf",
|
||||
"typ": "ClickHouse",
|
||||
"name": "ClickHouse 插入未压缩",
|
||||
"unit": "sishort",
|
||||
"note": " 插入操作写入的未压缩字节数。",
|
||||
"lang": "zh_CN",
|
||||
"expression": "clickhouse_events_inserted_bytes",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153904402000,
|
||||
"collector": "Categraf",
|
||||
"typ": "ClickHouse",
|
||||
"name": "ClickHouse 插入行数",
|
||||
"unit": "none",
|
||||
"note": "",
|
||||
"lang": "zh_CN",
|
||||
"expression": "clickhouse_events_inserted_rows",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153905722000,
|
||||
"collector": "Categraf",
|
||||
"typ": "ClickHouse",
|
||||
"name": "ClickHouse 查询优先级",
|
||||
"unit": "sishort",
|
||||
"note": "由于优先级设置,被停止并等待的查询数量。\n",
|
||||
"lang": "zh_CN",
|
||||
"expression": "clickhouse_metrics_query_preempted",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153906824000,
|
||||
"collector": "Categraf",
|
||||
"typ": "ClickHouse",
|
||||
"name": "ClickHouse 查询总数",
|
||||
"unit": "none",
|
||||
"note": "ClickHouse执行的查询总数。",
|
||||
"lang": "zh_CN",
|
||||
"expression": "clickhouse_events_query",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153907953000,
|
||||
"collector": "Categraf",
|
||||
"typ": "ClickHouse",
|
||||
"name": "ClickHouse 查询总时间",
|
||||
"unit": "milliseconds",
|
||||
"note": "查询执行的总时间(微秒)。",
|
||||
"lang": "zh_CN",
|
||||
"expression": "clickhouse_events_query_time_microseconds",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153909480000,
|
||||
"collector": "Categraf",
|
||||
"typ": "ClickHouse",
|
||||
"name": "ClickHouse 正被删除数据量",
|
||||
"unit": "sishort",
|
||||
"note": "正在被删除的数据部分数量。",
|
||||
"lang": "zh_CN",
|
||||
"expression": "clickhouse_metrics_parts_deleting",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153911177000,
|
||||
"collector": "Categraf",
|
||||
"typ": "ClickHouse",
|
||||
"name": "ClickHouse 移动池活动任务数",
|
||||
"unit": "sishort",
|
||||
"note": "后台移动池中的活动任务数,用于处理数据移动。",
|
||||
"lang": "zh_CN",
|
||||
"expression": "clickhouse_metrics_background_move_pool_task",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153912274000,
|
||||
"collector": "Categraf",
|
||||
"typ": "ClickHouse",
|
||||
"name": "ClickHouse 紧凑数据量",
|
||||
"unit": "sishort",
|
||||
"note": "紧凑数据部分的数量。",
|
||||
"lang": "zh_CN",
|
||||
"expression": "clickhouse_metrics_parts_compact",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153913312000,
|
||||
"collector": "Categraf",
|
||||
"typ": "ClickHouse",
|
||||
"name": "ClickHouse 缓冲区活动任务数",
|
||||
"unit": "sishort",
|
||||
"note": "后台缓冲区冲洗调度池中的活动任务数,用于定期缓冲区冲洗。",
|
||||
"lang": "zh_CN",
|
||||
"expression": "clickhouse_metrics_background_buffer_flush_schedule_pool_task",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153914788000,
|
||||
"collector": "Categraf",
|
||||
"typ": "ClickHouse",
|
||||
"name": "ClickHouse 跨磁盘量",
|
||||
"unit": "sishort",
|
||||
"note": "移动到另一个磁盘并应在析构函数中删除的数据部分数量。",
|
||||
"lang": "zh_CN",
|
||||
"expression": "clickhouse_metrics_parts_delete_on_destroy",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153916159000,
|
||||
"collector": "Categraf",
|
||||
"typ": "ClickHouse",
|
||||
"name": "ClickHouse 过时数据量",
|
||||
"unit": "sishort",
|
||||
"note": " 过时的数据部分数量,这些部分不是活动数据部分,但当前SELECT查询可能使用它们。",
|
||||
"lang": "zh_CN",
|
||||
"expression": "clickhouse_metrics_parts_outdated",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153917507000,
|
||||
"collector": "Categraf",
|
||||
"typ": "ClickHouse",
|
||||
"name": "ClickHouse中内存使用情况",
|
||||
"unit": "sishort",
|
||||
"note": "ClickHouse服务器使用的总内存量。",
|
||||
"lang": "zh_CN",
|
||||
"expression": "clickhouse_metrics_memory_tracking",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153918455000,
|
||||
"collector": "Categraf",
|
||||
"typ": "ClickHouse",
|
||||
"name": "ClickHouse中数据库数量",
|
||||
"unit": "none",
|
||||
"note": "ClickHouse数据库数量",
|
||||
"lang": "zh_CN",
|
||||
"expression": "clickhouse_asynchronous_metrics_number_of_databases",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153919709000,
|
||||
"collector": "Categraf",
|
||||
"typ": "ClickHouse",
|
||||
"name": "ClickHouse中表的数量",
|
||||
"unit": "none",
|
||||
"note": "ClickHouse表数量",
|
||||
"lang": "zh_CN",
|
||||
"expression": "clickhouse_asynchronous_metrics_number_of_tables",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153920898000,
|
||||
"collector": "Categraf",
|
||||
"typ": "ClickHouse",
|
||||
"name": "ClickHouse修订",
|
||||
"unit": "none",
|
||||
"note": "ClickHouse服务器的修订号,通常是一个用于标识特定构建的数字。",
|
||||
"lang": "zh_CN",
|
||||
"expression": "clickhouse_metrics_revision",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153921934000,
|
||||
"collector": "Categraf",
|
||||
"typ": "ClickHouse",
|
||||
"name": "ClickHouse服务器运行时间",
|
||||
"unit": "sishort",
|
||||
"note": "ClickHouse服务器自启动以来的运行时间。",
|
||||
"lang": "zh_CN",
|
||||
"expression": "clickhouse_asynchronous_metrics_uptime",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153923130000,
|
||||
"collector": "Categraf",
|
||||
"typ": "ClickHouse",
|
||||
"name": "ClickHouse版本号",
|
||||
"unit": "none",
|
||||
"note": "ClickHouse服务器的版本号,以整数形式表示。",
|
||||
"lang": "zh_CN",
|
||||
"expression": "clickhouse_metrics_version_integer",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
}
|
||||
]
|
||||
797
integrations/ClickHouse/metrics/clickhouse_by_exporter.json
Normal file
797
integrations/ClickHouse/metrics/clickhouse_by_exporter.json
Normal file
@@ -0,0 +1,797 @@
|
||||
[
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153924793000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "ClickHouse Tcp 连接数",
|
||||
"unit": "none",
|
||||
"note": "tcp连接数",
|
||||
"lang": "zh_CN",
|
||||
"expression": "ClickHouseMetrics_TCPConnection",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153926074000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "ClickHouse 内存",
|
||||
"unit": "bitsIEC",
|
||||
"note": "分配的内存总量",
|
||||
"lang": "zh_CN",
|
||||
"expression": "ClickHouseMetrics_MemoryTracking",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153927130000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "INSERT查询平均延迟",
|
||||
"unit": "microseconds",
|
||||
"note": "INSERT查询平均延迟",
|
||||
"lang": "zh_CN",
|
||||
"expression": "increase(ClickHouseProfileEvents_InsertQueryTimeMicroseconds[1m]) / (increase(ClickHouseProfileEvents_InsertQuery[1m]) + 0.001)",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153928310000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "INSERT查询数",
|
||||
"unit": "queries",
|
||||
"note": "与查询数相同,但仅限于INSERT查询",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_InsertQuery[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153929755000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "lseek函数调用次数",
|
||||
"unit": "times",
|
||||
"note": "'lseek'函数被调用的次数",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_Seek[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153931299000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "MergeTree表写入的压缩字节数",
|
||||
"unit": "bytes",
|
||||
"note": "",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_MergeTreeDataWriterCompressedBytes[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153932255000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "MergeTree表插入的数据块数",
|
||||
"unit": "blocks",
|
||||
"note": "插入到MergeTree表的数据块数。每个块形成一个数据部分",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_MergeTreeDataWriterBlocks[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153933664000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "MergeTree表插入的未压缩字节数",
|
||||
"unit": "bytes",
|
||||
"note": "插入到MergeTree表的未压缩字节数(列以它们在内存中存储的形式)\n\n在ClickHouse数据库中,当数据被插入到MergeTree系列表(包括ReplicatedMergeTree等)时,在数据实际被写入磁盘并经过压缩处理之前,在内存中以原始格式暂存时所占用的字节数量。这里的“未压缩”意味着数据还未经过ClickHouse为了节省存储空间而在存储阶段执行的列式存储压缩算法处理",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_MergeTreeDataWriterUncompressedBytes[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153934869000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "MergeTree表插入的行数",
|
||||
"unit": "rows",
|
||||
"note": "插入到MergeTree表的行数",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_MergeTreeDataWriterRows[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153935835000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "SELECT查询平均延迟",
|
||||
"unit": "microseconds",
|
||||
"note": "SELECT查询平均延迟",
|
||||
"lang": "zh_CN",
|
||||
"expression": "increase(ClickHouseProfileEvents_SelectQueryTimeMicroseconds[1m]) / (increase(ClickHouseProfileEvents_SelectQuery[1m]) + 0.001)",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153937045000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "SELECT查询数",
|
||||
"unit": "queries",
|
||||
"note": "SELECT查询的数量",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_SelectQuery[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153938270000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "SELECT查询的字节数",
|
||||
"unit": "bytes",
|
||||
"note": "从所有表SELECT的字节数(未压缩列以它们在内存中存储的形式)",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_SelectedBytes[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153939642000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "SELECT查询的行数",
|
||||
"unit": "rows",
|
||||
"note": "从所有表SELECT的行数",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_SelectedRows[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153940852000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "TCP连接数",
|
||||
"unit": "connections",
|
||||
"note": "与 TCP 服务器(带本地接口的客户端)的连接数,也包括服务器-服务器连接",
|
||||
"lang": "zh_CN",
|
||||
"expression": "ClickHouseMetrics_TCPConnection",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153941862000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "临时部分数",
|
||||
"unit": "parts",
|
||||
"note": "目前正在生成的部分,不在数据部分列表中",
|
||||
"lang": "zh_CN",
|
||||
"expression": "ClickHouseMetrics_PartsTemporary",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153942864000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "从文件描述符读取失败次数",
|
||||
"unit": "times",
|
||||
"note": "从文件描述符读取(read/pread)失败的次数",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_ReadBufferFromFileDescriptorReadFailed[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153943822000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "从文件描述符读取次数",
|
||||
"unit": "reads",
|
||||
"note": "从文件描述符进行读取(read/pread)的次数,不包括套接字",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_ReadBufferFromFileDescriptorRead[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153944918000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "从文件描述符读取的字节数",
|
||||
"unit": "bytes",
|
||||
"note": "从文件描述符读取的字节数。如果文件是压缩的,这将显示压缩后的数据大小",
|
||||
"lang": "zh_CN",
|
||||
"expression": "irate(ClickHouseProfileEvents_ReadBufferFromFileDescriptorReadBytes[2m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153946307000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "保留空间",
|
||||
"unit": "bytes",
|
||||
"note": "为当前运行的后台合并保留的磁盘空间。它略大于当前合并部分的总大小",
|
||||
"lang": "zh_CN",
|
||||
"expression": "ClickHouseMetrics_DiskSpaceReservedForMerge",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153947296000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "内存占用",
|
||||
"unit": "bytes",
|
||||
"note": "分配的内存总量",
|
||||
"lang": "zh_CN",
|
||||
"expression": "ClickHouseMetrics_MemoryTracking",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153948199000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "写入文件描述符次数",
|
||||
"unit": "writes",
|
||||
"note": "写入文件描述符(write/pwrite)的次数,不包括套接字",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_WriteBufferFromFileDescriptorWrite[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153949391000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "写入文件描述符的字节数",
|
||||
"unit": "bytes",
|
||||
"note": "写入文件描述符的字节数。如果文件是压缩的,这将显示压缩后的数据大小",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_WriteBufferFromFileDescriptorWriteBytes[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153950577000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "合并平均持续时间",
|
||||
"unit": "milliseconds",
|
||||
"note": "合并的平均持续时间",
|
||||
"lang": "zh_CN",
|
||||
"expression": "increase(ClickHouseProfileEvents_MergesTimeMilliseconds[1m]) / increase(ClickHouseProfileEvents_Merge[1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153953146000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "合并读取的未压缩字节数",
|
||||
"unit": "bytes",
|
||||
"note": "后台合并读取的未压缩字节数(列以它们在内存中存储的形式)。这是合并前的字节数",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_MergedUncompressedBytes[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153954853000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "合并读取的行数",
|
||||
"unit": "rows",
|
||||
"note": "后台合并读取的行数。这是合并前的行数",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_MergedRows[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153956608000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "后台合并次数",
|
||||
"unit": "times",
|
||||
"note": "启动的后台合并次数",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_Merge[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153957668000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "复制部分合并次数",
|
||||
"unit": "times",
|
||||
"note": "ReplicatedMergeTree表的数据部分成功合并的次数",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_ReplicatedPartMerges[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153958797000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "复制部分数据丢失次数",
|
||||
"unit": "times",
|
||||
"note": "数据在任何副本上都不存在的次数(即使是现在离线的副本)。这些数据部分肯定丢失了。由于异步复制(如果未启用配额插入),当写入数据部分的副本失败并且在故障后重新联机时不包含该数据部分,这是正常现象",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_ReplicatedDataLoss[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153959861000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "失败的INSERT查询数",
|
||||
"unit": "times",
|
||||
"note": "与失败的查询相同,但仅限于INSERT查询",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_FailedInsertQuery[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153961190000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "失败的SELECT查询数",
|
||||
"unit": "queries",
|
||||
"note": "与失败的查询相同,但仅限于SELECT查询",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_FailedSelectQuery[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153962249000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "失败的查询数",
|
||||
"unit": "queries",
|
||||
"note": "失败的查询数量",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_FailedQuery[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153963287000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "延迟插入次数",
|
||||
"unit": "times",
|
||||
"note": "由于分区的活动数据部分数量过多,INSERT到MergeTree表的块被限制的次数",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_DelayedInserts[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153964822000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "延迟插入阻塞的平均等待时间",
|
||||
"unit": "milliseconds",
|
||||
"note": "由于分区的活动数据部分数量过多,INSERT到MergeTree表的块被限制时的总等待时间(毫秒)",
|
||||
"lang": "zh_CN",
|
||||
"expression": "increase(ClickHouseProfileEvents_DelayedInsertsMilliseconds[1m]) / (increase(ClickHouseProfileEvents_DelayedInserts[1m]) + 0.01)",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153966130000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "慢查询次数",
|
||||
"unit": "times",
|
||||
"note": "从文件中进行慢查询读取的次数,这表明系统过载",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_SlowRead[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153967132000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "打开的文件数",
|
||||
"unit": "files",
|
||||
"note": "打开的文件数量",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_FileOpen[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153968376000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "拒绝插入次数",
|
||||
"unit": "times",
|
||||
"note": "由于分区的活动数据部分数量过多,INSERT到MergeTree表的块被拒绝的次数",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_RejectedInserts[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153969972000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "提交部分数",
|
||||
"unit": "parts",
|
||||
"note": "已经提交的数据部分的数量",
|
||||
"lang": "zh_CN",
|
||||
"expression": "ClickHouseMetrics_PartsCommitted",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153971113000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "插入字节数",
|
||||
"unit": "bytes",
|
||||
"note": "所有表INSERT的字节数(未压缩列以它们在内存中存储的形式)",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_InsertedBytes[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153972182000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "插入行数",
|
||||
"unit": "rows",
|
||||
"note": "所有表INSERT的行数",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_InsertedRows[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153973527000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "未压缩缓存命中次数",
|
||||
"unit": "times",
|
||||
"note": "未压缩缓存命中的次数",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_UncompressedCacheHits[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153974747000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "未压缩缓存未命中次数",
|
||||
"unit": "times",
|
||||
"note": "未压缩缓存未命中的次数",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_UncompressedCacheMisses[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153976184000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "查询内存限制超标次数",
|
||||
"unit": "times",
|
||||
"note": "查询内存限制超标的次数",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_QueryMemoryLimitExceeded[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153977623000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "查询处理线程降低次数",
|
||||
"unit": "times",
|
||||
"note": "由于慢查询读取,降低查询处理线程数的次数",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_ReadBackoff[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153978786000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "查询平均延迟",
|
||||
"unit": "microseconds",
|
||||
"note": "查询平均延迟",
|
||||
"lang": "zh_CN",
|
||||
"expression": "increase(ClickHouseProfileEvents_QueryTimeMicroseconds[1m]) / (increase(ClickHouseProfileEvents_Query[1m]) + 0.001)",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153980379000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "查询总数",
|
||||
"unit": "queries",
|
||||
"note": "需要解释和可能执行的查询数量,不包括失败的查询",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_Query[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153981570000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "标记缓存命中次数",
|
||||
"unit": "times",
|
||||
"note": "标记缓存命中的次数",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_MarkCacheHits[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153983200000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "标记缓存未命中次数",
|
||||
"unit": "times",
|
||||
"note": "标记缓存未命中的次数",
|
||||
"lang": "zh_CN",
|
||||
"expression": "rate(ClickHouseProfileEvents_MarkCacheMisses[2m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153984657000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "读取的压缩块数",
|
||||
"unit": "blocks",
|
||||
"note": "从压缩源(文件,网络)读取的压缩块数(独立压缩的数据块)",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_CompressedReadBufferBlocks[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153985923000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "读取的数据部分数",
|
||||
"unit": "parts",
|
||||
"note": "从MergeTree表读取的数据部分数",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_SelectedParts[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153987437000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "读取的未压缩字节数",
|
||||
"unit": "bytes",
|
||||
"note": "从压缩源(文件,网络)读取的未压缩字节数(解压后的字节数)",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_CompressedReadBufferBytes[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153988925000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "读取的标记数",
|
||||
"unit": "marks",
|
||||
"note": "从MergeTree表读取的标记数(索引粒度)",
|
||||
"lang": "zh_CN",
|
||||
"expression": "irate(ClickHouseProfileEvents_SelectedMarks[2m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153990107000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "读取的范围数",
|
||||
"unit": "ranges",
|
||||
"note": "从MergeTree表读取的所有数据部分中(非相邻)的范围数",
|
||||
"lang": "zh_CN",
|
||||
"expression": "max_over_time(irate(ClickHouseProfileEvents_SelectedRanges[2m]) [1h:1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"uuid": 1719305153991749000,
|
||||
"collector": "Exporter",
|
||||
"typ": "ClickHouse",
|
||||
"name": "预提交部分数",
|
||||
"unit": "parts",
|
||||
"note": "在数据部分中,但不用于SELECT查询的部分",
|
||||
"lang": "zh_CN",
|
||||
"expression": "ClickHouseMetrics_PartsPreCommitted",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
}
|
||||
]
|
||||
@@ -192,7 +192,7 @@
|
||||
"prom_ql": "",
|
||||
"queries": [
|
||||
{
|
||||
"prom_ql": "elasticsearch_filesystem_data_available_bytes / elasticsearch_filesystem_data_size_bytes * 100 \u003c 10",
|
||||
"prom_ql": "elasticsearch_filesystem_data_available_bytes / elasticsearch_filesystem_data_size_in_bytes * 100 \u003c 10",
|
||||
"severity": 1
|
||||
}
|
||||
],
|
||||
@@ -275,7 +275,7 @@
|
||||
"prom_ql": "",
|
||||
"queries": [
|
||||
{
|
||||
"prom_ql": "elasticsearch_filesystem_data_available_bytes / elasticsearch_filesystem_data_size_bytes * 100 \u003c 20",
|
||||
"prom_ql": "elasticsearch_filesystem_data_available_bytes / elasticsearch_filesystem_data_size_in_bytes * 100 \u003c 20",
|
||||
"severity": 2
|
||||
}
|
||||
],
|
||||
@@ -1078,4 +1078,4 @@
|
||||
"update_by": "",
|
||||
"uuid": 1717556327360313000
|
||||
}
|
||||
]
|
||||
]
|
||||
|
||||
@@ -4,7 +4,6 @@ ElasticSearch 通过 HTTP JSON 的方式暴露了自身的监控指标,通过
|
||||
|
||||
如果是小规模集群,设置 `local=false`,从集群中某一个节点抓取数据,即可拿到整个集群所有节点的监控数据。如果是大规模集群,建议设置 `local=true`,在集群的每个节点上都部署抓取器,抓取本地 elasticsearch 进程的监控数据。
|
||||
|
||||
ElasticSearch 详细的监控讲解,请参考这篇 [文章](https://time.geekbang.org/column/article/628847)。
|
||||
|
||||
## 配置示例
|
||||
|
||||
|
||||
463
integrations/IPMI/dashboards/IPMI_by_prometheus.json
Normal file
463
integrations/IPMI/dashboards/IPMI_by_prometheus.json
Normal file
@@ -0,0 +1,463 @@
|
||||
{
|
||||
"name": "IPMI for Prometheus",
|
||||
"ident": "",
|
||||
"configs": {
|
||||
"version": "2.0.0",
|
||||
"links": [],
|
||||
"var": [
|
||||
{
|
||||
"name": "node",
|
||||
"type": "query",
|
||||
"datasource": {
|
||||
"cate": "prometheus"
|
||||
},
|
||||
"definition": "label_values(ipmi_bmc_info, ident)",
|
||||
"reg": "",
|
||||
"multi": false
|
||||
}
|
||||
],
|
||||
"panels": [
|
||||
{
|
||||
"type": "gauge",
|
||||
"id": "f975fded-f57e-4a6e-80b4-50d5be6dd84c",
|
||||
"layout": {
|
||||
"h": 7,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 0,
|
||||
"i": "f975fded-f57e-4a6e-80b4-50d5be6dd84c",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "2.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "ipmi_temperature_celsius{ident='$node'}",
|
||||
"legend": "{{name}}"
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "Temperatures",
|
||||
"links": [],
|
||||
"custom": {
|
||||
"textMode": "valueAndName",
|
||||
"calc": "avg"
|
||||
},
|
||||
"options": {
|
||||
"valueMappings": [],
|
||||
"standardOptions": {
|
||||
"util": "none"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 80
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"id": "681f1191-4777-4377-8b77-404d9f036406",
|
||||
"layout": {
|
||||
"h": 5,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 7,
|
||||
"i": "681f1191-4777-4377-8b77-404d9f036406",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "2.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "ipmi_power_watts{ident='$node'}",
|
||||
"legend": "{{name}}"
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "Power",
|
||||
"links": [],
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
},
|
||||
"standardOptions": {},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "#634CD9",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"spanNulls": false,
|
||||
"lineWidth": 1,
|
||||
"fillOpacity": 0.5,
|
||||
"gradientMode": "none",
|
||||
"stack": "off",
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"id": "feede24c-8296-4127-982e-08cfc4151933",
|
||||
"layout": {
|
||||
"h": 5,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 7,
|
||||
"i": "feede24c-8296-4127-982e-08cfc4151933",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "2.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "ipmi_power_watts{ident='$node'} * 30 * 24 ",
|
||||
"legend": "{{name}}"
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "Power usage 30d",
|
||||
"links": [],
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
},
|
||||
"standardOptions": {},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "#634CD9",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"spanNulls": false,
|
||||
"lineWidth": 1,
|
||||
"fillOpacity": 0.5,
|
||||
"gradientMode": "none",
|
||||
"stack": "off",
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"id": "9e11e7f5-ed3c-49eb-8a72-ee76c8700c24",
|
||||
"layout": {
|
||||
"h": 7,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 12,
|
||||
"i": "9e11e7f5-ed3c-49eb-8a72-ee76c8700c24",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "2.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "ipmi_temperature_celsius{ident='$node'}",
|
||||
"legend": "{{name}}"
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "Temperatures",
|
||||
"links": [],
|
||||
"description": "",
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "list",
|
||||
"placement": "bottom"
|
||||
},
|
||||
"standardOptions": {
|
||||
"util": "none"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 80
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "linear",
|
||||
"spanNulls": false,
|
||||
"lineWidth": 1,
|
||||
"fillOpacity": 0.5,
|
||||
"gradientMode": "none",
|
||||
"stack": "off",
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"id": "95c734f7-26cb-41a7-8376-49332cc220c2",
|
||||
"layout": {
|
||||
"h": 7,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 12,
|
||||
"i": "95c734f7-26cb-41a7-8376-49332cc220c2",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "2.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "ipmi_power_watts{ident='$node'}",
|
||||
"legend": "{{name}}"
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "Power",
|
||||
"links": [],
|
||||
"description": "",
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "list",
|
||||
"placement": "bottom"
|
||||
},
|
||||
"standardOptions": {
|
||||
"util": "none"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 80
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "linear",
|
||||
"spanNulls": false,
|
||||
"lineWidth": 1,
|
||||
"fillOpacity": 0.01,
|
||||
"gradientMode": "none",
|
||||
"stack": "off",
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"id": "0313f34f-afcf-41e9-8f69-9a3dbd4b2e56",
|
||||
"layout": {
|
||||
"h": 7,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 19,
|
||||
"i": "0313f34f-afcf-41e9-8f69-9a3dbd4b2e56",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "2.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "ipmi_fan_speed_rpm{ident='$node'}",
|
||||
"legend": "{{name}}"
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "Fans",
|
||||
"links": [],
|
||||
"description": "",
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "list",
|
||||
"placement": "bottom"
|
||||
},
|
||||
"standardOptions": {
|
||||
"util": "none"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 80
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "linear",
|
||||
"spanNulls": false,
|
||||
"lineWidth": 1,
|
||||
"fillOpacity": 0.5,
|
||||
"gradientMode": "none",
|
||||
"stack": "off",
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"id": "29ee004d-a95c-405d-97d1-d715fab4e1de",
|
||||
"layout": {
|
||||
"h": 7,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 19,
|
||||
"i": "29ee004d-a95c-405d-97d1-d715fab4e1de",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "2.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "ipmi_voltage_volts{ident='$node',name!~\"Voltage 1|Voltage 2\"}",
|
||||
"legend": "{{name}}"
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "Voltages",
|
||||
"links": [],
|
||||
"description": "",
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "list",
|
||||
"placement": "bottom"
|
||||
},
|
||||
"standardOptions": {
|
||||
"util": "none"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 80
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "linear",
|
||||
"spanNulls": false,
|
||||
"lineWidth": 1,
|
||||
"fillOpacity": 0.5,
|
||||
"gradientMode": "none",
|
||||
"stack": "off",
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
"uuid": 1727587308068775200
|
||||
}
|
||||
@@ -1,6 +1,6 @@
|
||||
# kafka plugin
|
||||
|
||||
Kafka 的核心指标,其实都是通过 JMX 的方式暴露的,可以参考这篇 [文章](https://time.geekbang.org/column/article/628498)。对于 JMX 暴露的指标,使用 jolokia 或者使用 jmx_exporter 那个 jar 包来采集即可,不需要本插件。
|
||||
Kafka 的核心指标,其实都是通过 JMX 的方式暴露的。对于 JMX 暴露的指标,使用 jolokia 或者使用 jmx_exporter 那个 jar 包来采集即可,不需要本插件。
|
||||
|
||||
本插件主要是采集的消费者延迟数据,这个数据无法通过 Kafka 服务端的 JMX 拿到。
|
||||
|
||||
|
||||
1706
integrations/Kubernetes/dashboards/DeploymentContainer.json
Normal file
1706
integrations/Kubernetes/dashboards/DeploymentContainer.json
Normal file
File diff suppressed because it is too large
Load Diff
1706
integrations/Kubernetes/dashboards/StatefulsetContainer.json
Normal file
1706
integrations/Kubernetes/dashboards/StatefulsetContainer.json
Normal file
File diff suppressed because it is too large
Load Diff
@@ -1,6 +1,6 @@
|
||||
# Kubernetes
|
||||
|
||||
这个插件已经废弃。Kubernetes 监控系列可以参考这个 [文章](https://flashcat.cloud/categories/kubernetes%E7%9B%91%E6%8E%A7%E4%B8%93%E6%A0%8F/)。或者参考 [专栏](https://time.geekbang.org/column/article/630306)。
|
||||
这个插件已经废弃。Kubernetes 监控系列可以参考这个 [文章](https://flashcat.cloud/categories/kubernetes%E7%9B%91%E6%8E%A7%E4%B8%93%E6%A0%8F/)。
|
||||
|
||||
不过 Kubernetes 这个类别下的内置告警规则和内置仪表盘都是可以使用的。
|
||||
|
||||
|
||||
@@ -1425,7 +1425,7 @@
|
||||
"rule_config": {
|
||||
"queries": [
|
||||
{
|
||||
"prom_ql": "kernel_vmstat_oom_kill != 0",
|
||||
"prom_ql": "increase(kernel_vmstat_oom_kill[2m]) > 0",
|
||||
"severity": 2
|
||||
}
|
||||
]
|
||||
@@ -2139,4 +2139,4 @@
|
||||
"update_by": "",
|
||||
"uuid": 1717556327737117000
|
||||
}
|
||||
]
|
||||
]
|
||||
|
||||
@@ -1,13 +1,6 @@
|
||||
{
|
||||
"id": 0,
|
||||
"group_id": 0,
|
||||
"name": "机器台账表格视图",
|
||||
"ident": "",
|
||||
"tags": "",
|
||||
"create_at": 0,
|
||||
"create_by": "",
|
||||
"update_at": 0,
|
||||
"update_by": "",
|
||||
"configs": {
|
||||
"links": [
|
||||
{
|
||||
@@ -28,7 +21,7 @@
|
||||
"colorRange": [
|
||||
"thresholds"
|
||||
],
|
||||
"detailUrl": "/dashboards-built-in/detail?__built-in-cate=Linux\u0026__built-in-name=Linux%20Host%20by%20Categraf%20v2\u0026ident=${__field.labels.ident}",
|
||||
"detailUrl": "/built-in-components/dashboard/detail?__uuid__=1717556327744505000&ident=${__field.labels.ident}",
|
||||
"textMode": "valueAndName",
|
||||
"valueField": "Value"
|
||||
},
|
||||
@@ -98,7 +91,7 @@
|
||||
"colorRange": [
|
||||
"thresholds"
|
||||
],
|
||||
"detailUrl": "/dashboards-built-in/detail?__built-in-cate=Linux\u0026__built-in-name=Linux%20Host%20by%20Categraf%20v2\u0026ident=${__field.labels.ident}",
|
||||
"detailUrl": "/built-in-components/dashboard/detail?__uuid__=1717556327744505000&ident=${__field.labels.ident}",
|
||||
"textMode": "valueAndName",
|
||||
"valueField": "Value"
|
||||
},
|
||||
@@ -171,13 +164,16 @@
|
||||
"linkMode": "appendLinkColumn",
|
||||
"links": [
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "详情",
|
||||
"url": "/dashboards-built-in/detail?__built-in-cate=Linux\u0026__built-in-name=Linux%20Host%20by%20Categraf%20v2\u0026ident=${__field.labels.ident}"
|
||||
"url": "/built-in-components/dashboard/detail?__uuid__=1717556327744505000&ident=${__field.labels.ident}"
|
||||
}
|
||||
],
|
||||
"nowrap": false,
|
||||
"showHeader": true,
|
||||
"sortColumn": "ident",
|
||||
"sortOrder": "ascend"
|
||||
"sortOrder": "ascend",
|
||||
"tableLayout": "fixed"
|
||||
},
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${prom}",
|
||||
@@ -385,10 +381,5 @@
|
||||
],
|
||||
"version": "3.0.0"
|
||||
},
|
||||
"public": 0,
|
||||
"public_cate": 0,
|
||||
"bgids": null,
|
||||
"built_in": 0,
|
||||
"hide": 0,
|
||||
"uuid": 1717556327742611000
|
||||
}
|
||||
@@ -2051,7 +2051,7 @@
|
||||
"cate": "prometheus",
|
||||
"value": "${prom}"
|
||||
},
|
||||
"definition": "label_values(node_cpu_seconds_total, instance)",
|
||||
"definition": "label_values(node_uname_info, instance)",
|
||||
"name": "node",
|
||||
"selected": "$node",
|
||||
"type": "query"
|
||||
|
||||
@@ -252,7 +252,7 @@
|
||||
"cate": "prometheus",
|
||||
"value": "${prom}"
|
||||
},
|
||||
"definition": "label_values(node_cpu_seconds_total, instance)",
|
||||
"definition": "label_values(node_uname_info, instance)",
|
||||
"name": "node",
|
||||
"selected": "$node",
|
||||
"type": "query"
|
||||
|
||||
@@ -259,11 +259,11 @@
|
||||
"uuid": 1717556327796195000,
|
||||
"collector": "Categraf",
|
||||
"typ": "Linux",
|
||||
"name": "OOM 次数统计",
|
||||
"name": "1分钟内 OOM 次数统计",
|
||||
"unit": "none",
|
||||
"note": "取自 `/proc/vmstat`,需要较高版本的内核,没记错的话应该是 4.13 以上版本",
|
||||
"lang": "zh_CN",
|
||||
"expression": "kernel_vmstat_oom_kill",
|
||||
"expression": "increase(kernel_vmstat_oom_kill[1m])",
|
||||
"created_at": 0,
|
||||
"created_by": "",
|
||||
"updated_at": 0,
|
||||
@@ -1334,4 +1334,4 @@
|
||||
"updated_at": 0,
|
||||
"updated_by": ""
|
||||
}
|
||||
]
|
||||
]
|
||||
|
||||
1806
integrations/MySQL/dashboards/MySQL仪表盘-远端.json
Normal file
1806
integrations/MySQL/dashboards/MySQL仪表盘-远端.json
Normal file
File diff suppressed because it is too large
Load Diff
1802
integrations/MySQL/dashboards/MySQL仪表盘.json
Normal file
1802
integrations/MySQL/dashboards/MySQL仪表盘.json
Normal file
File diff suppressed because it is too large
Load Diff
@@ -1,251 +0,0 @@
|
||||
[
|
||||
{
|
||||
"id": 0,
|
||||
"group_id": 0,
|
||||
"cate": "prometheus",
|
||||
"datasource_ids": [
|
||||
0
|
||||
],
|
||||
"cluster": "",
|
||||
"name": "Process X high number of open files - exporter",
|
||||
"note": "",
|
||||
"prod": "metric",
|
||||
"algorithm": "",
|
||||
"algo_params": null,
|
||||
"delay": 0,
|
||||
"severity": 2,
|
||||
"severities": [
|
||||
2
|
||||
],
|
||||
"disabled": 1,
|
||||
"prom_for_duration": 60,
|
||||
"prom_ql": "",
|
||||
"rule_config": {
|
||||
"algo_params": null,
|
||||
"inhibit": false,
|
||||
"prom_ql": "",
|
||||
"queries": [
|
||||
{
|
||||
"prom_ql": "avg by (instance) (namedprocess_namegroup_worst_fd_ratio{groupname=\"X\"}) * 100 \u003e 80",
|
||||
"severity": 2
|
||||
}
|
||||
],
|
||||
"severity": 0
|
||||
},
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_stimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_etime": "23:59",
|
||||
"enable_etimes": [
|
||||
"23:59"
|
||||
],
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_days_of_weeks": [
|
||||
[
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
]
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_groups_obj": null,
|
||||
"notify_groups": null,
|
||||
"notify_repeat_step": 60,
|
||||
"notify_max_number": 0,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": [
|
||||
"alertname=ProcessHighOpenFiles"
|
||||
],
|
||||
"annotations": null,
|
||||
"extra_config": null,
|
||||
"create_at": 0,
|
||||
"create_by": "",
|
||||
"update_at": 0,
|
||||
"update_by": "",
|
||||
"uuid": 1717556328248638000
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"group_id": 0,
|
||||
"cate": "prometheus",
|
||||
"datasource_ids": [
|
||||
0
|
||||
],
|
||||
"cluster": "",
|
||||
"name": "Process X is down - exporter",
|
||||
"note": "",
|
||||
"prod": "metric",
|
||||
"algorithm": "",
|
||||
"algo_params": null,
|
||||
"delay": 0,
|
||||
"severity": 1,
|
||||
"severities": [
|
||||
1
|
||||
],
|
||||
"disabled": 1,
|
||||
"prom_for_duration": 0,
|
||||
"prom_ql": "",
|
||||
"rule_config": {
|
||||
"algo_params": null,
|
||||
"inhibit": false,
|
||||
"prom_ql": "",
|
||||
"queries": [
|
||||
{
|
||||
"prom_ql": "sum by (instance) (namedprocess_namegroup_num_procs{groupname=\"X\"}) == 0",
|
||||
"severity": 1
|
||||
}
|
||||
],
|
||||
"severity": 0
|
||||
},
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_stimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_etime": "23:59",
|
||||
"enable_etimes": [
|
||||
"23:59"
|
||||
],
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_days_of_weeks": [
|
||||
[
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
]
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_groups_obj": null,
|
||||
"notify_groups": null,
|
||||
"notify_repeat_step": 60,
|
||||
"notify_max_number": 0,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": [
|
||||
"alertname=ProcessNotRunning"
|
||||
],
|
||||
"annotations": null,
|
||||
"extra_config": null,
|
||||
"create_at": 0,
|
||||
"create_by": "",
|
||||
"update_at": 0,
|
||||
"update_by": "",
|
||||
"uuid": 1717556328249307000
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"group_id": 0,
|
||||
"cate": "prometheus",
|
||||
"datasource_ids": [
|
||||
0
|
||||
],
|
||||
"cluster": "",
|
||||
"name": "Process X is restarted - exporter",
|
||||
"note": "",
|
||||
"prod": "metric",
|
||||
"algorithm": "",
|
||||
"algo_params": null,
|
||||
"delay": 0,
|
||||
"severity": 3,
|
||||
"severities": [
|
||||
3
|
||||
],
|
||||
"disabled": 1,
|
||||
"prom_for_duration": 0,
|
||||
"prom_ql": "",
|
||||
"rule_config": {
|
||||
"algo_params": null,
|
||||
"inhibit": false,
|
||||
"prom_ql": "",
|
||||
"queries": [
|
||||
{
|
||||
"prom_ql": "namedprocess_namegroup_oldest_start_time_seconds{groupname=\"X\"} \u003e time() - 60 ",
|
||||
"severity": 3
|
||||
}
|
||||
],
|
||||
"severity": 0
|
||||
},
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_stimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_etime": "23:59",
|
||||
"enable_etimes": [
|
||||
"23:59"
|
||||
],
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_days_of_weeks": [
|
||||
[
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
]
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_groups_obj": null,
|
||||
"notify_groups": null,
|
||||
"notify_repeat_step": 60,
|
||||
"notify_max_number": 0,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": [
|
||||
"alertname=ProcessRestarted"
|
||||
],
|
||||
"annotations": null,
|
||||
"extra_config": null,
|
||||
"create_at": 0,
|
||||
"create_by": "",
|
||||
"update_at": 0,
|
||||
"update_by": "",
|
||||
"uuid": 1717556328249744000
|
||||
}
|
||||
]
|
||||
@@ -1,172 +0,0 @@
|
||||
[
|
||||
{
|
||||
"id": 0,
|
||||
"group_id": 0,
|
||||
"cate": "prometheus",
|
||||
"datasource_ids": [
|
||||
0
|
||||
],
|
||||
"cluster": "",
|
||||
"name": "process handle limit is too low",
|
||||
"note": "",
|
||||
"prod": "metric",
|
||||
"algorithm": "",
|
||||
"algo_params": null,
|
||||
"delay": 0,
|
||||
"severity": 3,
|
||||
"severities": [
|
||||
3
|
||||
],
|
||||
"disabled": 1,
|
||||
"prom_for_duration": 60,
|
||||
"prom_ql": "",
|
||||
"rule_config": {
|
||||
"algo_params": null,
|
||||
"inhibit": false,
|
||||
"prom_ql": "",
|
||||
"queries": [
|
||||
{
|
||||
"prom_ql": "procstat_rlimit_num_fds_soft \u003c 2048",
|
||||
"severity": 3
|
||||
}
|
||||
],
|
||||
"severity": 0
|
||||
},
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_stimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_etime": "23:59",
|
||||
"enable_etimes": [
|
||||
"23:59"
|
||||
],
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_days_of_weeks": [
|
||||
[
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
]
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [
|
||||
"email",
|
||||
"dingtalk",
|
||||
"wecom"
|
||||
],
|
||||
"notify_groups_obj": null,
|
||||
"notify_groups": null,
|
||||
"notify_repeat_step": 60,
|
||||
"notify_max_number": 0,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": [],
|
||||
"annotations": null,
|
||||
"extra_config": null,
|
||||
"create_at": 0,
|
||||
"create_by": "",
|
||||
"update_at": 0,
|
||||
"update_by": "",
|
||||
"uuid": 1717556328251224000
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"group_id": 0,
|
||||
"cate": "prometheus",
|
||||
"datasource_ids": [
|
||||
0
|
||||
],
|
||||
"cluster": "",
|
||||
"name": "there is a process count of 0, indicating that a certain process may have crashed",
|
||||
"note": "",
|
||||
"prod": "metric",
|
||||
"algorithm": "",
|
||||
"algo_params": null,
|
||||
"delay": 0,
|
||||
"severity": 1,
|
||||
"severities": [
|
||||
1
|
||||
],
|
||||
"disabled": 1,
|
||||
"prom_for_duration": 60,
|
||||
"prom_ql": "",
|
||||
"rule_config": {
|
||||
"algo_params": null,
|
||||
"inhibit": false,
|
||||
"prom_ql": "",
|
||||
"queries": [
|
||||
{
|
||||
"prom_ql": "procstat_lookup_count == 0",
|
||||
"severity": 1
|
||||
}
|
||||
],
|
||||
"severity": 0
|
||||
},
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_stimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_etime": "23:59",
|
||||
"enable_etimes": [
|
||||
"23:59"
|
||||
],
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_days_of_weeks": [
|
||||
[
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
]
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [
|
||||
"email",
|
||||
"dingtalk",
|
||||
"wecom"
|
||||
],
|
||||
"notify_groups_obj": null,
|
||||
"notify_groups": null,
|
||||
"notify_repeat_step": 60,
|
||||
"notify_max_number": 0,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": [],
|
||||
"annotations": null,
|
||||
"extra_config": null,
|
||||
"create_at": 0,
|
||||
"create_by": "",
|
||||
"update_at": 0,
|
||||
"update_by": "",
|
||||
"uuid": 1717556328254260000
|
||||
}
|
||||
]
|
||||
File diff suppressed because it is too large
Load Diff
Binary file not shown.
|
Before Width: | Height: | Size: 2.2 KiB |
File diff suppressed because it is too large
Load Diff
@@ -2085,7 +2085,6 @@
|
||||
],
|
||||
"var": [
|
||||
{
|
||||
"defaultValue": 2,
|
||||
"definition": "prometheus",
|
||||
"label": "datasource",
|
||||
"name": "datasource",
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user