mirror of
https://github.com/ccfos/nightingale.git
synced 2026-03-03 14:38:55 +00:00
Compare commits
25 Commits
webhook-ba
...
readme_en
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
acfb010593 | ||
|
|
7da4c99d92 | ||
|
|
6b46e7e83f | ||
|
|
514ccd5f90 | ||
|
|
4565b80717 | ||
|
|
2bac6588c4 | ||
|
|
fc293cb01c | ||
|
|
73f9548242 | ||
|
|
7c91e51c08 | ||
|
|
a4867c406d | ||
|
|
bfea83ae75 | ||
|
|
7a2832c377 | ||
|
|
3f6c54a712 | ||
|
|
1bb590ce6d | ||
|
|
656326458f | ||
|
|
c6ab3ad2b3 | ||
|
|
d050cf72e9 | ||
|
|
084cc1893e | ||
|
|
cd01123b59 | ||
|
|
23ce84d41c | ||
|
|
4764cc2419 | ||
|
|
da66401576 | ||
|
|
0024c9d99c | ||
|
|
96d3b48f10 | ||
|
|
6a0e7a810f |
17
README.md
17
README.md
@@ -29,15 +29,15 @@
|
||||
|
||||
## 夜莺 Nightingale 是什么
|
||||
|
||||
夜莺监控是一款开源云原生观测分析工具,采用 All-in-One 的设计理念,集数据采集、可视化、监控告警、数据分析于一体,与云原生生态紧密集成,提供开箱即用的企业级监控分析和告警能力。夜莺于 2020 年 3 月 20 日,在 github 上发布 v1 版本,已累计迭代 100 多个版本。
|
||||
夜莺监控是一款开源云原生观测分析工具,采用 All-in-One 的设计理念,集数据采集、可视化、监控告警、数据分析于一体,与云原生生态紧密集成,提供开箱即用的企业级监控分析和告警能力。夜莺于 2020 年 3 月 20 日,在 GitHub 上发布 v1 版本,已累计迭代 100 多个版本。
|
||||
|
||||
夜莺最初由滴滴开发和开源,并于 2022 年 5 月 11 日,捐赠予中国计算机学会开源发展委员会(CCF ODC),为 CCF ODC 成立后接受捐赠的第一个开源项目。夜莺的核心研发团队,也是 Open-Falcon 项目原核心研发人员,从 2014 年(Open-Falcon 是 2014 年开源)算起来,也有 10 年了,只为把监控这个事情做好。
|
||||
|
||||
|
||||
## 快速开始
|
||||
- 👉[文档中心](https://flashcat.cloud/docs/) | [下载中心](https://flashcat.cloud/download/nightingale/)
|
||||
- ❤️[报告 Bug](https://github.com/ccfos/nightingale/issues/new?assignees=&labels=&projects=&template=question.yml)
|
||||
- ℹ️为了提供更快速的访问体验,上述文档和下载站点托管于 [FlashcatCloud](https://flashcat.cloud)
|
||||
- 👉 [文档中心](https://flashcat.cloud/docs/) | [下载中心](https://flashcat.cloud/download/nightingale/)
|
||||
- ❤️ [报告 Bug](https://github.com/ccfos/nightingale/issues/new?assignees=&labels=&projects=&template=question.yml)
|
||||
- ℹ️ 为了提供更快速的访问体验,上述文档和下载站点托管于 [FlashcatCloud](https://flashcat.cloud)
|
||||
|
||||
## 功能特点
|
||||
|
||||
@@ -50,6 +50,11 @@
|
||||
|
||||
## 截图演示
|
||||
|
||||
|
||||
你可以在页面的右上角,切换语言和主题,目前我们支持英语、简体中文、繁体中文。
|
||||
|
||||

|
||||
|
||||
即时查询,类似 Prometheus 内置的查询分析页面,做 ad-hoc 查询,夜莺做了一些 UI 优化,同时提供了一些内置 promql 指标,让不太了解 promql 的用户也可以快速查询。
|
||||
|
||||

|
||||
@@ -90,8 +95,8 @@
|
||||
|
||||
|
||||
## 社区共建
|
||||
- ❇️请阅读浏览[夜莺开源项目和社区治理架构草案](./doc/community-governance.md),真诚欢迎每一位用户、开发者、公司以及组织,使用夜莺监控、积极反馈 Bug、提交功能需求、分享最佳实践,共建专业、活跃的夜莺开源社区。
|
||||
- 夜莺贡献者❤️
|
||||
- ❇️ 请阅读浏览[夜莺开源项目和社区治理架构草案](./doc/community-governance.md),真诚欢迎每一位用户、开发者、公司以及组织,使用夜莺监控、积极反馈 Bug、提交功能需求、分享最佳实践,共建专业、活跃的夜莺开源社区。
|
||||
- ❤️ 夜莺贡献者
|
||||
<a href="https://github.com/ccfos/nightingale/graphs/contributors">
|
||||
<img src="https://contrib.rocks/image?repo=ccfos/nightingale" />
|
||||
</a>
|
||||
|
||||
129
README_en.md
129
README_en.md
@@ -1,104 +1,113 @@
|
||||
<p align="center">
|
||||
<a href="https://github.com/ccfos/nightingale">
|
||||
<img src="doc/img/Nightingale_L_V.png" alt="nightingale - cloud native monitoring" width="240" /></a>
|
||||
<img src="doc/img/Nightingale_L_V.png" alt="nightingale - cloud native monitoring" width="100" /></a>
|
||||
</p>
|
||||
<p align="center">
|
||||
<b>Open-source Alert Management Expert, an Integrated Observability Platform</b>
|
||||
</p>
|
||||
|
||||
<p align="center">
|
||||
<img alt="GitHub latest release" src="https://img.shields.io/github/v/release/ccfos/nightingale"/>
|
||||
<a href="https://n9e.github.io">
|
||||
<a href="https://flashcat.cloud/docs/">
|
||||
<img alt="Docs" src="https://img.shields.io/badge/docs-get%20started-brightgreen"/></a>
|
||||
<a href="https://hub.docker.com/u/flashcatcloud">
|
||||
<img alt="Docker pulls" src="https://img.shields.io/docker/pulls/flashcatcloud/nightingale"/></a>
|
||||
<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/ccfos/nightingale">
|
||||
<img alt="GitHub Repo issues" src="https://img.shields.io/github/issues/ccfos/nightingale">
|
||||
<img alt="GitHub Repo issues closed" src="https://img.shields.io/github/issues-closed/ccfos/nightingale">
|
||||
<img alt="GitHub forks" src="https://img.shields.io/github/forks/ccfos/nightingale">
|
||||
<a href="https://github.com/ccfos/nightingale/graphs/contributors">
|
||||
<img alt="GitHub contributors" src="https://img.shields.io/github/contributors-anon/ccfos/nightingale"/></a>
|
||||
<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/ccfos/nightingale">
|
||||
<img alt="GitHub forks" src="https://img.shields.io/github/forks/ccfos/nightingale">
|
||||
<br/><img alt="GitHub Repo issues" src="https://img.shields.io/github/issues/ccfos/nightingale">
|
||||
<img alt="GitHub Repo issues closed" src="https://img.shields.io/github/issues-closed/ccfos/nightingale">
|
||||
<img alt="GitHub latest release" src="https://img.shields.io/github/v/release/ccfos/nightingale"/>
|
||||
<img alt="License" src="https://img.shields.io/badge/license-Apache--2.0-blue"/>
|
||||
<a href="https://n9e-talk.slack.com/">
|
||||
<img alt="GitHub contributors" src="https://img.shields.io/badge/join%20slack-%23n9e-brightgreen.svg"/></a>
|
||||
<img alt="License" src="https://img.shields.io/badge/license-Apache--2.0-blue"/>
|
||||
</p>
|
||||
<p align="center">
|
||||
An open-source cloud-native monitoring system that is <b>all-in-one</b> <br/>
|
||||
<b>Out-of-the-box</b>, it integrates data collection, visualization, and monitoring alert <br/>
|
||||
We recommend upgrading your <b>Prometheus + AlertManager + Grafana</b> combination to Nightingale!
|
||||
</p>
|
||||
|
||||
|
||||
|
||||
[English](./README_en.md) | [中文](./README.md)
|
||||
|
||||
## What is Nightingale
|
||||
|
||||
## Highlighted Features
|
||||
Nightingale aims to combine the advantages of Prometheus and Grafana. It manages alert rules and visualizes metrics, logs, traces in a beautiful WebUI.
|
||||
|
||||
- **Out-of-the-box**
|
||||
- Supports multiple deployment methods such as **Docker, Helm Chart, and cloud services**, integrates data collection, monitoring, and alerting into one system, and comes with various monitoring dashboards, quick views, and alert rule templates. **It greatly reduces the construction cost, learning cost, and usage cost of cloud-native monitoring systems**.
|
||||
- **Professional Alerting**
|
||||
- Provides visual alert configuration and management, supports various alert rules, offers the ability to configure silence and subscription rules, supports multiple alert delivery channels, and has features such as alert self-healing and event management.
|
||||
- **Cloud-Native**
|
||||
- Quickly builds an enterprise-level cloud-native monitoring system through a turnkey approach, supports multiple collectors such as [Categraf](https://github.com/flashcatcloud/categraf), Telegraf, and Grafana-agent, supports multiple data sources such as Prometheus, VictoriaMetrics, M3DB, ElasticSearch, and Jaeger, and is compatible with importing Grafana dashboards. **It seamlessly integrates with the cloud-native ecosystem**.
|
||||
- **High Performance and High Availability**
|
||||
- Due to the multi-data-source management engine of Nightingale and its excellent architecture design, and utilizing a high-performance time-series database, it can handle data collection, storage, and alert analysis scenarios with billions of time-series data, saving a lot of costs.
|
||||
- Nightingale components can be horizontally scaled with no single point of failure. It has been deployed in thousands of enterprises and tested in harsh production practices. Many leading Internet companies have used Nightingale for cluster machines with hundreds of nodes, processing billions of time-series data.
|
||||
- **Flexible Extension and Centralized Management**
|
||||
- Nightingale can be deployed on a 1-core 1G cloud host, deployed in a cluster of hundreds of machines, or run in Kubernetes. Time-series databases, alert engines, and other components can also be decentralized to various data centers and regions, balancing edge deployment with centralized management. **It solves the problem of data fragmentation and lack of unified views**.
|
||||
Originally developed and open-sourced by Didi, Nightingale was donated to the China Computer Federation Open Source Development Committee (CCF ODC) on May 11, 2022, becoming the first open-source project accepted by the CCF ODC after its establishment.
|
||||
|
||||
|
||||
#### If you are using Prometheus and have one or more of the following requirement scenarios, it is recommended that you upgrade to Nightingale:
|
||||
## Quick Start
|
||||
|
||||
- Multiple systems such as Prometheus, Alertmanager, Grafana, etc. are fragmented and lack a unified view and cannot be used out of the box;
|
||||
- The way to manage Prometheus and Alertmanager by modifying configuration files has a big learning curve and is difficult to collaborate;
|
||||
- Too much data to scale-up your Prometheus cluster;
|
||||
- Multiple Prometheus clusters running in production environments, which faced high management and usage costs;
|
||||
- 👉 [Documentation](https://flashcat.cloud/docs/) | [Download](https://flashcat.cloud/download/nightingale/)
|
||||
- ❤️ [Report a Bug](https://github.com/ccfos/nightingale/issues/new?assignees=&labels=&projects=&template=question.yml)
|
||||
- ℹ️ For faster access, the above documentation and download sites are hosted on [FlashcatCloud](https://flashcat.cloud).
|
||||
|
||||
#### If you are using Zabbix and have the following scenarios, it is recommended that you upgrade to Nightingale:
|
||||
## Features
|
||||
|
||||
- Monitoring too much data and wanting a better scalable solution;
|
||||
- A high learning curve and a desire for better efficiency of collaborative use in a multi-person, multi-team model;
|
||||
- Microservice and cloud-native architectures with variable monitoring data lifecycles and high monitoring data dimension bases, which are not easily adaptable to the Zabbix data model;
|
||||
- **Integration with Multiple Time-Series Databases:** Supports integration with various time-series databases such as Prometheus, VictoriaMetrics, Thanos, Mimir, M3DB, and TDengine, enabling unified alert management.
|
||||
- **Advanced Alerting Capabilities:** Comes with built-in support for multiple alerting rules, extensible to common notification channels. It also supports alert suppression, silencing, subscription, self-healing, and alert event management.
|
||||
- **High-Performance Visualization Engine:** Offers various chart styles with numerous built-in dashboard templates and the ability to import Grafana templates. Ready to use with a business-friendly open-source license.
|
||||
- **Support for Common Collectors:** Compatible with [Categraf](https://flashcat.cloud/product/categraf), Telegraf, Grafana-agent, Datadog-agent, and various exporters as collectors—there's no data that can't be monitored.
|
||||
- **Seamless Integration with [Flashduty](https://flashcat.cloud/product/flashcat-duty/):** Enables alert aggregation, acknowledgment, escalation, scheduling, and IM integration, ensuring no alerts are missed, reducing unnecessary interruptions, and enhancing efficient collaboration.
|
||||
|
||||
|
||||
#### If you are using [open-falcon](https://github.com/open-falcon/falcon-plus), we recommend you to upgrade to Nightingale:
|
||||
- For more information about open-falcon and Nightingale, please refer to read [Ten features and trends of cloud-native monitoring](https://mp.weixin.qq.com/s?__biz=MzkzNjI5OTM5Nw==&mid=2247483738&idx=1&sn=e8bdbb974a2cd003c1abcc2b5405dd18&chksm=c2a19fb0f5d616a63185cd79277a79a6b80118ef2185890d0683d2bb20451bd9303c78d083c5#rd)。
|
||||
|
||||
## Getting Started
|
||||
|
||||
[https://n9e.github.io/](https://n9e.github.io/)
|
||||
|
||||
## Screenshots
|
||||
|
||||
https://user-images.githubusercontent.com/792850/216888712-2565fcea-9df5-47bd-a49e-d60af9bd76e8.mp4
|
||||
You can switch languages and themes in the top right corner. We now support English, Simplified Chinese, and Traditional Chinese.
|
||||
|
||||

|
||||
|
||||
### Instant Query
|
||||
|
||||
Similar to the built-in query analysis page in Prometheus, Nightingale offers an ad-hoc query feature with UI enhancements. It also provides built-in PromQL metrics, allowing users unfamiliar with PromQL to quickly perform queries.
|
||||
|
||||

|
||||
|
||||
### Metric View
|
||||
|
||||
Alternatively, you can use the Metric View to access data. With this feature, Instant Query becomes less necessary, as it caters more to advanced users. Regular users can easily perform queries using the Metric View.
|
||||
|
||||

|
||||
|
||||
### Built-in Dashboards
|
||||
|
||||
Nightingale includes commonly used dashboards that can be imported and used directly. You can also import Grafana dashboards, although compatibility is limited to basic Grafana charts. If you’re accustomed to Grafana, it’s recommended to continue using it for visualization, with Nightingale serving as an alerting engine.
|
||||
|
||||

|
||||
|
||||
### Built-in Alert Rules
|
||||
|
||||
In addition to the built-in dashboards, Nightingale also comes with numerous alert rules that are ready to use out of the box.
|
||||
|
||||

|
||||
|
||||
|
||||
|
||||
## Architecture
|
||||
|
||||
<img src="doc/img/arch-product.png" width="600">
|
||||
In most community scenarios, Nightingale is primarily used as an alert engine, integrating with multiple time-series databases to unify alert rule management. Grafana remains the preferred tool for visualization. As an alert engine, the product architecture of Nightingale is as follows:
|
||||
|
||||
Nightingale monitoring can receive monitoring data reported by various collectors (such as [Categraf](https://github.com/flashcatcloud/categraf) , telegraf, grafana-agent, Prometheus, etc.) and write them to various popular time-series databases (such as Prometheus, M3DB, VictoriaMetrics, Thanos, TDEngine, etc.). It provides configuration capabilities for alert rules, silence rules, and subscription rules, as well as the ability to view monitoring data. It also provides automatic alarm self-healing mechanisms (such as automatically calling back to a webhook address or executing a script after an alarm is triggered), and the ability to store and manage historical alarm events and view them in groups.
|
||||

|
||||
|
||||
If the performance of a standalone time-series database (such as Prometheus) has bottlenecks or poor disaster recovery, we recommend using [VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics). The VictoriaMetrics architecture is relatively simple, has excellent performance, and is easy to deploy and maintain. The architecture diagram is as shown above. For more detailed documentation on VictoriaMetrics, please refer to its [official website](https://victoriametrics.com/).
|
||||
For certain edge data centers with poor network connectivity to the central Nightingale server, we offer a distributed deployment mode for the alert engine. In this mode, even if the network is disconnected, the alerting functionality remains unaffected.
|
||||
|
||||
**We welcome you to participate in the Nightingale open-source project and community in various ways, including but not limited to**:
|
||||
- Adding and improving documentation => [n9e.github.io](https://n9e.github.io/)
|
||||
- Sharing your best practices and experience in using Nightingale monitoring => [Article sharing]((https://n9e.github.io/docs/prologue/share/))
|
||||
- Submitting product suggestions => [github issue](https://github.com/ccfos/nightingale/issues/new?assignees=&labels=kind%2Ffeature&template=enhancement.md)
|
||||
- Submitting code to make Nightingale monitoring faster, more stable, and easier to use => [github pull request](https://github.com/didi/nightingale/pulls)
|
||||

|
||||
|
||||
|
||||
**Respecting, recognizing, and recording the work of every contributor** is the first guiding principle of the Nightingale open-source community. We advocate effective questioning, which not only respects the developer's time but also contributes to the accumulation of knowledge in the entire community
|
||||
- Before asking a question, please first refer to the [FAQ](https://www.gitlink.org.cn/ccfos/nightingale/wiki/faq)
|
||||
- We use [GitHub Discussions](https://github.com/ccfos/nightingale/discussions) as the communication forum. You can search and ask questions here.
|
||||
- We also recommend that you join ours [Slack channel](https://n9e-talk.slack.com/) to exchange experiences with other Nightingale users.
|
||||
## Communication Channels
|
||||
|
||||
|
||||
## Who is using Nightingale
|
||||
You can register your usage and share your experience by posting on **[Who is Using Nightingale](https://github.com/ccfos/nightingale/issues/897)**.
|
||||
- **Report Bugs:** It is highly recommended to submit issues via the [Nightingale GitHub Issue tracker](https://github.com/ccfos/nightingale/issues/new?assignees=&labels=kind%2Fbug&projects=&template=bug_report.yml).
|
||||
- **Documentation:** For more information, we recommend thoroughly browsing the [Nightingale Documentation Site](https://flashcat.cloud/docs/content/flashcat-monitor/nightingale-v7/introduction/).
|
||||
|
||||
## Stargazers over time
|
||||
[](https://starchart.cc/ccfos/nightingale)
|
||||
|
||||
## Contributors
|
||||
[](https://star-history.com/#ccfos/nightingale&Date)
|
||||
|
||||
## Community Co-Building
|
||||
|
||||
- ❇️ Please read the [Nightingale Open Source Project and Community Governance Draft](./doc/community-governance.md). We sincerely welcome every user, developer, company, and organization to use Nightingale, actively report bugs, submit feature requests, share best practices, and help build a professional and active open-source community.
|
||||
- ❤️ Nightingale Contributors
|
||||
<a href="https://github.com/ccfos/nightingale/graphs/contributors">
|
||||
<img src="https://contrib.rocks/image?repo=ccfos/nightingale" />
|
||||
</a>
|
||||
|
||||
## License
|
||||
[Apache License V2.0](https://github.com/didi/nightingale/blob/main/LICENSE)
|
||||
- [Apache License V2.0](https://github.com/didi/nightingale/blob/main/LICENSE)
|
||||
|
||||
@@ -32,6 +32,7 @@ type Alerting struct {
|
||||
Timeout int64
|
||||
TemplatesDir string
|
||||
NotifyConcurrency int
|
||||
WebhookBatchSend bool
|
||||
}
|
||||
|
||||
type CallPlugin struct {
|
||||
|
||||
@@ -112,5 +112,5 @@ func Start(alertc aconf.Alert, pushgwc pconf.Pushgw, syncStats *memsto.Stats, al
|
||||
go consumer.LoopConsume()
|
||||
|
||||
go queue.ReportQueueSize(alertStats)
|
||||
go sender.InitEmailSender(notifyConfigCache)
|
||||
go sender.InitEmailSender(ctx, notifyConfigCache)
|
||||
}
|
||||
|
||||
@@ -167,7 +167,7 @@ func (e *Dispatch) HandleEventNotify(event *models.AlertCurEvent, isSubscribe bo
|
||||
}
|
||||
|
||||
// 处理事件发送,这里用一个goroutine处理一个event的所有发送事件
|
||||
go e.Send(rule, event, notifyTarget)
|
||||
go e.Send(rule, event, notifyTarget, isSubscribe)
|
||||
|
||||
// 如果是不是订阅规则出现的event, 则需要处理订阅规则的event
|
||||
if !isSubscribe {
|
||||
@@ -238,11 +238,12 @@ func (e *Dispatch) handleSub(sub *models.AlertSubscribe, event models.AlertCurEv
|
||||
e.HandleEventNotify(&event, true)
|
||||
}
|
||||
|
||||
func (e *Dispatch) Send(rule *models.AlertRule, event *models.AlertCurEvent, notifyTarget *NotifyTarget) {
|
||||
func (e *Dispatch) Send(rule *models.AlertRule, event *models.AlertCurEvent, notifyTarget *NotifyTarget, isSubscribe bool) {
|
||||
needSend := e.BeforeSenderHook(event)
|
||||
if needSend {
|
||||
for channel, uids := range notifyTarget.ToChannelUserMap() {
|
||||
msgCtx := sender.BuildMessageContext(rule, []*models.AlertCurEvent{event}, uids, e.userCache, e.Astats)
|
||||
msgCtx := sender.BuildMessageContext(e.ctx, rule, []*models.AlertCurEvent{event},
|
||||
uids, e.userCache, e.Astats)
|
||||
e.RwLock.RLock()
|
||||
s := e.Senders[channel]
|
||||
e.RwLock.RUnlock()
|
||||
@@ -265,10 +266,19 @@ func (e *Dispatch) Send(rule *models.AlertRule, event *models.AlertCurEvent, not
|
||||
e.SendCallbacks(rule, notifyTarget, event)
|
||||
|
||||
// handle global webhooks
|
||||
sender.SendWebhooks(notifyTarget.ToWebhookList(), event, e.Astats)
|
||||
if e.alerting.WebhookBatchSend {
|
||||
sender.BatchSendWebhooks(e.ctx, notifyTarget.ToWebhookList(), event, e.Astats)
|
||||
} else {
|
||||
sender.SingleSendWebhooks(e.ctx, notifyTarget.ToWebhookList(), event, e.Astats)
|
||||
}
|
||||
|
||||
// handle plugin call
|
||||
go sender.MayPluginNotify(e.genNoticeBytes(event), e.notifyConfigCache.GetNotifyScript(), e.Astats)
|
||||
if !isSubscribe {
|
||||
// handle ibex callbacks
|
||||
e.HandleIbex(rule, event)
|
||||
// handle plugin call
|
||||
go sender.MayPluginNotify(e.ctx, e.genNoticeBytes(event), e.notifyConfigCache.
|
||||
GetNotifyScript(), e.Astats, event)
|
||||
}
|
||||
}
|
||||
|
||||
func (e *Dispatch) SendCallbacks(rule *models.AlertRule, notifyTarget *NotifyTarget, event *models.AlertCurEvent) {
|
||||
@@ -280,7 +290,7 @@ func (e *Dispatch) SendCallbacks(rule *models.AlertRule, notifyTarget *NotifyTar
|
||||
continue
|
||||
}
|
||||
|
||||
cbCtx := sender.BuildCallBackContext(e.ctx, urlStr, rule, []*models.AlertCurEvent{event}, uids, e.userCache, e.Astats)
|
||||
cbCtx := sender.BuildCallBackContext(e.ctx, urlStr, rule, []*models.AlertCurEvent{event}, uids, e.userCache, e.alerting.WebhookBatchSend, e.Astats)
|
||||
|
||||
if strings.HasPrefix(urlStr, "${ibex}") {
|
||||
e.CallBacks[models.IbexDomain].CallBack(cbCtx)
|
||||
@@ -318,6 +328,30 @@ func (e *Dispatch) SendCallbacks(rule *models.AlertRule, notifyTarget *NotifyTar
|
||||
}
|
||||
}
|
||||
|
||||
func (e *Dispatch) HandleIbex(rule *models.AlertRule, event *models.AlertCurEvent) {
|
||||
// 解析 RuleConfig 字段
|
||||
var ruleConfig struct {
|
||||
TaskTpls []*models.Tpl `json:"task_tpls"`
|
||||
}
|
||||
json.Unmarshal([]byte(rule.RuleConfig), &ruleConfig)
|
||||
|
||||
for _, t := range ruleConfig.TaskTpls {
|
||||
if t.TplId == 0 {
|
||||
continue
|
||||
}
|
||||
|
||||
if len(t.Host) == 0 {
|
||||
sender.CallIbex(e.ctx, t.TplId, event.TargetIdent,
|
||||
e.taskTplsCache, e.targetCache, e.userCache, event)
|
||||
continue
|
||||
}
|
||||
for _, host := range t.Host {
|
||||
sender.CallIbex(e.ctx, t.TplId, host,
|
||||
e.taskTplsCache, e.targetCache, e.userCache, event)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
type Notice struct {
|
||||
Event *models.AlertCurEvent `json:"event"`
|
||||
Tpls map[string]string `json:"tpls"`
|
||||
|
||||
@@ -79,6 +79,22 @@ func (s *NotifyTarget) ToCallbackList() []string {
|
||||
func (s *NotifyTarget) ToWebhookList() []*models.Webhook {
|
||||
webhooks := make([]*models.Webhook, 0, len(s.webhooks))
|
||||
for _, wh := range s.webhooks {
|
||||
if wh.Batch == 0 {
|
||||
wh.Batch = 1000
|
||||
}
|
||||
|
||||
if wh.Timeout == 0 {
|
||||
wh.Timeout = 10
|
||||
}
|
||||
|
||||
if wh.RetryCount == 0 {
|
||||
wh.RetryCount = 10
|
||||
}
|
||||
|
||||
if wh.RetryInterval == 0 {
|
||||
wh.RetryInterval = 10
|
||||
}
|
||||
|
||||
webhooks = append(webhooks, wh)
|
||||
}
|
||||
return webhooks
|
||||
|
||||
@@ -53,10 +53,11 @@ type Processor struct {
|
||||
datasourceId int64
|
||||
EngineName string
|
||||
|
||||
rule *models.AlertRule
|
||||
fires *AlertCurEventMap
|
||||
pendings *AlertCurEventMap
|
||||
inhibit bool
|
||||
rule *models.AlertRule
|
||||
fires *AlertCurEventMap
|
||||
pendings *AlertCurEventMap
|
||||
pendingsUseByRecover *AlertCurEventMap
|
||||
inhibit bool
|
||||
|
||||
tagsMap map[string]string
|
||||
tagsArr []string
|
||||
@@ -232,17 +233,17 @@ func Relabel(rule *models.AlertRule, event *models.AlertCurEvent) {
|
||||
return
|
||||
}
|
||||
|
||||
if len(rule.EventRelabelConfig) == 0 {
|
||||
return
|
||||
}
|
||||
|
||||
// need to keep the original label
|
||||
event.OriginalTags = event.Tags
|
||||
event.OriginalTagsJSON = make([]string, len(event.TagsJSON))
|
||||
|
||||
labels := make([]prompb.Label, len(event.TagsJSON))
|
||||
for i, tag := range event.TagsJSON {
|
||||
label := strings.Split(tag, "=")
|
||||
if len(label) != 2 {
|
||||
logger.Errorf("event%+v relabel: the label length is not 2:%v", event, label)
|
||||
continue
|
||||
}
|
||||
label := strings.SplitN(tag, "=", 2)
|
||||
event.OriginalTagsJSON[i] = tag
|
||||
labels[i] = prompb.Label{Name: label[0], Value: label[1]}
|
||||
}
|
||||
@@ -342,11 +343,22 @@ func (p *Processor) RecoverSingle(hash string, now int64, value *string, values
|
||||
if !has {
|
||||
return
|
||||
}
|
||||
|
||||
// 如果配置了留观时长,就不能立马恢复了
|
||||
if cachedRule.RecoverDuration > 0 && now-event.LastEvalTime < cachedRule.RecoverDuration {
|
||||
logger.Debugf("rule_eval:%s event:%v not recover", p.Key(), event)
|
||||
return
|
||||
if cachedRule.RecoverDuration > 0 {
|
||||
lastPendingEvent, has := p.pendingsUseByRecover.Get(hash)
|
||||
if !has {
|
||||
// 说明没有产生过异常点,就不需要恢复了
|
||||
logger.Debugf("rule_eval:%s event:%v do not has pending event, not recover", p.Key(), event)
|
||||
return
|
||||
}
|
||||
|
||||
if now-lastPendingEvent.LastEvalTime < cachedRule.RecoverDuration {
|
||||
logger.Debugf("rule_eval:%s event:%v not recover", p.Key(), event)
|
||||
return
|
||||
}
|
||||
}
|
||||
|
||||
if value != nil {
|
||||
event.TriggerValue = *value
|
||||
if len(values) > 0 {
|
||||
@@ -358,6 +370,7 @@ func (p *Processor) RecoverSingle(hash string, now int64, value *string, values
|
||||
// 我确实无法分辨,是prom中有值但是未满足阈值所以没返回,还是prom中确实丢了一些点导致没有数据可以返回,尴尬
|
||||
p.fires.Delete(hash)
|
||||
p.pendings.Delete(hash)
|
||||
p.pendingsUseByRecover.Delete(hash)
|
||||
|
||||
// 可能是因为调整了promql才恢复的,所以事件里边要体现最新的promql,否则用户会比较困惑
|
||||
// 当然,其实rule的各个字段都可能发生变化了,都更新一下吧
|
||||
@@ -389,9 +402,11 @@ func (p *Processor) handleEvent(events []*models.AlertCurEvent) {
|
||||
preEvent, has := p.pendings.Get(event.Hash)
|
||||
if has {
|
||||
p.pendings.UpdateLastEvalTime(event.Hash, event.LastEvalTime)
|
||||
p.pendingsUseByRecover.UpdateLastEvalTime(event.Hash, event.LastEvalTime)
|
||||
preTriggerTime = preEvent.TriggerTime
|
||||
} else {
|
||||
p.pendings.Set(event.Hash, event)
|
||||
p.pendingsUseByRecover.Set(event.Hash, event)
|
||||
preTriggerTime = event.TriggerTime
|
||||
}
|
||||
|
||||
@@ -477,6 +492,7 @@ func (p *Processor) pushEventToQueue(e *models.AlertCurEvent) {
|
||||
|
||||
func (p *Processor) RecoverAlertCurEventFromDb() {
|
||||
p.pendings = NewAlertCurEventMap(nil)
|
||||
p.pendingsUseByRecover = NewAlertCurEventMap(nil)
|
||||
|
||||
curEvents, err := models.AlertCurEventGetByRuleIdAndDsId(p.ctx, p.rule.Id, p.datasourceId)
|
||||
if err != nil {
|
||||
@@ -576,6 +592,7 @@ func (p *Processor) mayHandleGroup() {
|
||||
func (p *Processor) DeleteProcessEvent(hash string) {
|
||||
p.fires.Delete(hash)
|
||||
p.pendings.Delete(hash)
|
||||
p.pendingsUseByRecover.Delete(hash)
|
||||
}
|
||||
|
||||
func labelMapToArr(m map[string]string) []string {
|
||||
|
||||
@@ -29,13 +29,14 @@ type (
|
||||
Rule *models.AlertRule
|
||||
Events []*models.AlertCurEvent
|
||||
Stats *astats.Stats
|
||||
BatchSend bool
|
||||
}
|
||||
|
||||
DefaultCallBacker struct{}
|
||||
)
|
||||
|
||||
func BuildCallBackContext(ctx *ctx.Context, callBackURL string, rule *models.AlertRule, events []*models.AlertCurEvent,
|
||||
uids []int64, userCache *memsto.UserCacheType, stats *astats.Stats) CallBackContext {
|
||||
uids []int64, userCache *memsto.UserCacheType, batchSend bool, stats *astats.Stats) CallBackContext {
|
||||
users := userCache.GetByUserIds(uids)
|
||||
|
||||
newCallBackUrl, _ := events[0].ParseURL(callBackURL)
|
||||
@@ -45,6 +46,7 @@ func BuildCallBackContext(ctx *ctx.Context, callBackURL string, rule *models.Ale
|
||||
Rule: rule,
|
||||
Events: events,
|
||||
Users: users,
|
||||
BatchSend: batchSend,
|
||||
Stats: stats,
|
||||
}
|
||||
}
|
||||
@@ -112,6 +114,21 @@ func (c *DefaultCallBacker) CallBack(ctx CallBackContext) {
|
||||
|
||||
event := ctx.Events[0]
|
||||
|
||||
if ctx.BatchSend {
|
||||
webhookConf := &models.Webhook{
|
||||
Type: models.RuleCallback,
|
||||
Enable: true,
|
||||
Url: ctx.CallBackURL,
|
||||
Timeout: 5,
|
||||
RetryCount: 3,
|
||||
RetryInterval: 10,
|
||||
Batch: 1000,
|
||||
}
|
||||
|
||||
PushCallbackEvent(ctx.Ctx, webhookConf, event, ctx.Stats)
|
||||
return
|
||||
}
|
||||
|
||||
ctx.Stats.AlertNotifyTotal.WithLabelValues("rule_callback").Inc()
|
||||
resp, code, err := poster.PostJSON(ctx.CallBackURL, 5*time.Second, event, 3)
|
||||
if err != nil {
|
||||
@@ -124,19 +141,73 @@ func (c *DefaultCallBacker) CallBack(ctx CallBackContext) {
|
||||
}
|
||||
}
|
||||
|
||||
func doSend(url string, body interface{}, channel string, stats *astats.Stats) {
|
||||
func doSendAndRecord(ctx *ctx.Context, url, token string, body interface{}, channel string,
|
||||
stats *astats.Stats, event *models.AlertCurEvent) {
|
||||
res, err := doSend(url, body, channel, stats)
|
||||
NotifyRecord(ctx, event, channel, token, res, err)
|
||||
}
|
||||
|
||||
func NotifyRecord(ctx *ctx.Context, evt *models.AlertCurEvent, channel, target, res string, err error) {
|
||||
noti := models.NewNotificationRecord(evt, channel, target)
|
||||
if err != nil {
|
||||
noti.SetStatus(models.NotiStatusFailure)
|
||||
noti.SetDetails(err.Error())
|
||||
} else if res != "" {
|
||||
noti.SetDetails(string(res))
|
||||
}
|
||||
|
||||
if !ctx.IsCenter {
|
||||
_, err := poster.PostByUrlsWithResp[int64](ctx, "/v1/n9e/notify-record", noti)
|
||||
if err != nil {
|
||||
logger.Errorf("add noti:%v failed, err: %v", noti, err)
|
||||
}
|
||||
return
|
||||
}
|
||||
|
||||
if err := noti.Add(ctx); err != nil {
|
||||
logger.Errorf("add noti:%v failed, err: %v", noti, err)
|
||||
}
|
||||
}
|
||||
|
||||
func doSend(url string, body interface{}, channel string, stats *astats.Stats) (string, error) {
|
||||
stats.AlertNotifyTotal.WithLabelValues(channel).Inc()
|
||||
|
||||
res, code, err := poster.PostJSON(url, time.Second*5, body, 3)
|
||||
if err != nil {
|
||||
logger.Errorf("%s_sender: result=fail url=%s code=%d error=%v req:%v response=%s", channel, url, code, err, body, string(res))
|
||||
stats.AlertNotifyErrorTotal.WithLabelValues(channel).Inc()
|
||||
} else {
|
||||
logger.Infof("%s_sender: result=succ url=%s code=%d req:%v response=%s", channel, url, code, body, string(res))
|
||||
return "", err
|
||||
}
|
||||
|
||||
logger.Infof("%s_sender: result=succ url=%s code=%d req:%v response=%s", channel, url, code, body, string(res))
|
||||
return string(res), nil
|
||||
}
|
||||
|
||||
type TaskCreateReply struct {
|
||||
Err string `json:"err"`
|
||||
Dat int64 `json:"dat"` // task.id
|
||||
}
|
||||
|
||||
func PushCallbackEvent(ctx *ctx.Context, webhook *models.Webhook, event *models.AlertCurEvent, stats *astats.Stats) {
|
||||
CallbackEventQueueLock.RLock()
|
||||
queue := CallbackEventQueue[webhook.Url]
|
||||
CallbackEventQueueLock.RUnlock()
|
||||
|
||||
if queue == nil {
|
||||
queue = &WebhookQueue{
|
||||
list: NewSafeListLimited(QueueMaxSize),
|
||||
closeCh: make(chan struct{}),
|
||||
}
|
||||
|
||||
CallbackEventQueueLock.Lock()
|
||||
CallbackEventQueue[webhook.Url] = queue
|
||||
CallbackEventQueueLock.Unlock()
|
||||
|
||||
StartConsumer(ctx, queue, webhook.Batch, webhook, stats)
|
||||
}
|
||||
|
||||
succ := queue.list.PushFront(event)
|
||||
if !succ {
|
||||
logger.Warningf("Write channel(%s) full, current channel size: %d event:%v", webhook.Url, queue.list.Len(), event)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1,9 +1,10 @@
|
||||
package sender
|
||||
|
||||
import (
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"html/template"
|
||||
"strings"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
)
|
||||
|
||||
type dingtalkMarkdown struct {
|
||||
@@ -35,13 +36,13 @@ func (ds *DingtalkSender) Send(ctx MessageContext) {
|
||||
return
|
||||
}
|
||||
|
||||
urls, ats := ds.extract(ctx.Users)
|
||||
urls, ats, tokens := ds.extract(ctx.Users)
|
||||
if len(urls) == 0 {
|
||||
return
|
||||
}
|
||||
message := BuildTplMessage(models.Dingtalk, ds.tpl, ctx.Events)
|
||||
|
||||
for _, url := range urls {
|
||||
for i, url := range urls {
|
||||
var body dingtalk
|
||||
// NoAt in url
|
||||
if strings.Contains(url, "noat=1") {
|
||||
@@ -66,7 +67,7 @@ func (ds *DingtalkSender) Send(ctx MessageContext) {
|
||||
}
|
||||
}
|
||||
|
||||
doSend(url, body, models.Dingtalk, ctx.Stats)
|
||||
doSendAndRecord(ctx.Ctx, url, tokens[i], body, models.Dingtalk, ctx.Stats, ctx.Events[0])
|
||||
}
|
||||
}
|
||||
|
||||
@@ -96,15 +97,17 @@ func (ds *DingtalkSender) CallBack(ctx CallBackContext) {
|
||||
body.Markdown.Text = message
|
||||
}
|
||||
|
||||
doSend(ctx.CallBackURL, body, models.Dingtalk, ctx.Stats)
|
||||
doSendAndRecord(ctx.Ctx, ctx.CallBackURL, ctx.CallBackURL, body,
|
||||
"callback", ctx.Stats, ctx.Events[0])
|
||||
|
||||
ctx.Stats.AlertNotifyTotal.WithLabelValues("rule_callback").Inc()
|
||||
}
|
||||
|
||||
// extract urls and ats from Users
|
||||
func (ds *DingtalkSender) extract(users []*models.User) ([]string, []string) {
|
||||
func (ds *DingtalkSender) extract(users []*models.User) ([]string, []string, []string) {
|
||||
urls := make([]string, 0, len(users))
|
||||
ats := make([]string, 0, len(users))
|
||||
tokens := make([]string, 0, len(users))
|
||||
|
||||
for _, user := range users {
|
||||
if user.Phone != "" {
|
||||
@@ -116,7 +119,8 @@ func (ds *DingtalkSender) extract(users []*models.User) ([]string, []string) {
|
||||
url = "https://oapi.dingtalk.com/robot/send?access_token=" + token
|
||||
}
|
||||
urls = append(urls, url)
|
||||
tokens = append(tokens, token)
|
||||
}
|
||||
}
|
||||
return urls, ats
|
||||
return urls, ats, tokens
|
||||
}
|
||||
|
||||
@@ -9,13 +9,14 @@ import (
|
||||
"github.com/ccfos/nightingale/v6/alert/aconf"
|
||||
"github.com/ccfos/nightingale/v6/memsto"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
|
||||
"github.com/toolkits/pkg/logger"
|
||||
|
||||
"gopkg.in/gomail.v2"
|
||||
)
|
||||
|
||||
var mailch chan *gomail.Message
|
||||
var mailch chan *EmailContext
|
||||
|
||||
type EmailSender struct {
|
||||
subjectTpl *template.Template
|
||||
@@ -23,6 +24,11 @@ type EmailSender struct {
|
||||
smtp aconf.SMTPConfig
|
||||
}
|
||||
|
||||
type EmailContext struct {
|
||||
event *models.AlertCurEvent
|
||||
mail *gomail.Message
|
||||
}
|
||||
|
||||
func (es *EmailSender) Send(ctx MessageContext) {
|
||||
if len(ctx.Users) == 0 || len(ctx.Events) == 0 {
|
||||
return
|
||||
@@ -36,7 +42,7 @@ func (es *EmailSender) Send(ctx MessageContext) {
|
||||
subject = ctx.Events[0].RuleName
|
||||
}
|
||||
content := BuildTplMessage(models.Email, es.contentTpl, ctx.Events)
|
||||
es.WriteEmail(subject, content, tos)
|
||||
es.WriteEmail(subject, content, tos, ctx.Events[0])
|
||||
|
||||
ctx.Stats.AlertNotifyTotal.WithLabelValues(models.Email).Add(float64(len(tos)))
|
||||
}
|
||||
@@ -73,7 +79,8 @@ func SendEmail(subject, content string, tos []string, stmp aconf.SMTPConfig) err
|
||||
return nil
|
||||
}
|
||||
|
||||
func (es *EmailSender) WriteEmail(subject, content string, tos []string) {
|
||||
func (es *EmailSender) WriteEmail(subject, content string, tos []string,
|
||||
event *models.AlertCurEvent) {
|
||||
m := gomail.NewMessage()
|
||||
|
||||
m.SetHeader("From", es.smtp.From)
|
||||
@@ -81,7 +88,7 @@ func (es *EmailSender) WriteEmail(subject, content string, tos []string) {
|
||||
m.SetHeader("Subject", subject)
|
||||
m.SetBody("text/html", content)
|
||||
|
||||
mailch <- m
|
||||
mailch <- &EmailContext{event, m}
|
||||
}
|
||||
|
||||
func dialSmtp(d *gomail.Dialer) gomail.SendCloser {
|
||||
@@ -104,22 +111,22 @@ func dialSmtp(d *gomail.Dialer) gomail.SendCloser {
|
||||
|
||||
var mailQuit = make(chan struct{})
|
||||
|
||||
func RestartEmailSender(smtp aconf.SMTPConfig) {
|
||||
func RestartEmailSender(ctx *ctx.Context, smtp aconf.SMTPConfig) {
|
||||
// Notify internal start exit
|
||||
mailQuit <- struct{}{}
|
||||
startEmailSender(smtp)
|
||||
startEmailSender(ctx, smtp)
|
||||
}
|
||||
|
||||
var smtpConfig aconf.SMTPConfig
|
||||
|
||||
func InitEmailSender(ncc *memsto.NotifyConfigCacheType) {
|
||||
mailch = make(chan *gomail.Message, 100000)
|
||||
go updateSmtp(ncc)
|
||||
func InitEmailSender(ctx *ctx.Context, ncc *memsto.NotifyConfigCacheType) {
|
||||
mailch = make(chan *EmailContext, 100000)
|
||||
go updateSmtp(ctx, ncc)
|
||||
smtpConfig = ncc.GetSMTP()
|
||||
startEmailSender(smtpConfig)
|
||||
startEmailSender(ctx, smtpConfig)
|
||||
}
|
||||
|
||||
func updateSmtp(ncc *memsto.NotifyConfigCacheType) {
|
||||
func updateSmtp(ctx *ctx.Context, ncc *memsto.NotifyConfigCacheType) {
|
||||
for {
|
||||
time.Sleep(1 * time.Minute)
|
||||
smtp := ncc.GetSMTP()
|
||||
@@ -127,12 +134,12 @@ func updateSmtp(ncc *memsto.NotifyConfigCacheType) {
|
||||
smtpConfig.Pass != smtp.Pass || smtpConfig.User != smtp.User || smtpConfig.Port != smtp.Port ||
|
||||
smtpConfig.InsecureSkipVerify != smtp.InsecureSkipVerify { //diff
|
||||
smtpConfig = smtp
|
||||
RestartEmailSender(smtp)
|
||||
RestartEmailSender(ctx, smtp)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func startEmailSender(smtp aconf.SMTPConfig) {
|
||||
func startEmailSender(ctx *ctx.Context, smtp aconf.SMTPConfig) {
|
||||
conf := smtp
|
||||
if conf.Host == "" || conf.Port == 0 {
|
||||
logger.Warning("SMTP configurations invalid")
|
||||
@@ -167,7 +174,8 @@ func startEmailSender(smtp aconf.SMTPConfig) {
|
||||
}
|
||||
open = true
|
||||
}
|
||||
if err := gomail.Send(s, m); err != nil {
|
||||
var err error
|
||||
if err = gomail.Send(s, m.mail); err != nil {
|
||||
logger.Errorf("email_sender: failed to send: %s", err)
|
||||
|
||||
// close and retry
|
||||
@@ -184,11 +192,16 @@ func startEmailSender(smtp aconf.SMTPConfig) {
|
||||
}
|
||||
open = true
|
||||
|
||||
if err := gomail.Send(s, m); err != nil {
|
||||
if err = gomail.Send(s, m.mail); err != nil {
|
||||
logger.Errorf("email_sender: failed to retry send: %s", err)
|
||||
}
|
||||
} else {
|
||||
logger.Infof("email_sender: result=succ subject=%v to=%v", m.GetHeader("Subject"), m.GetHeader("To"))
|
||||
logger.Infof("email_sender: result=succ subject=%v to=%v",
|
||||
m.mail.GetHeader("Subject"), m.mail.GetHeader("To"))
|
||||
}
|
||||
|
||||
for _, to := range m.mail.GetHeader("To") {
|
||||
NotifyRecord(ctx, m.event, models.Email, to, "", err)
|
||||
}
|
||||
|
||||
size++
|
||||
|
||||
@@ -54,7 +54,8 @@ func (fs *FeishuSender) CallBack(ctx CallBackContext) {
|
||||
},
|
||||
}
|
||||
|
||||
doSend(ctx.CallBackURL, body, models.Feishu, ctx.Stats)
|
||||
doSendAndRecord(ctx.Ctx, ctx.CallBackURL, ctx.CallBackURL, body, "callback",
|
||||
ctx.Stats, ctx.Events[0])
|
||||
ctx.Stats.AlertNotifyTotal.WithLabelValues("rule_callback").Inc()
|
||||
}
|
||||
|
||||
@@ -62,9 +63,9 @@ func (fs *FeishuSender) Send(ctx MessageContext) {
|
||||
if len(ctx.Users) == 0 || len(ctx.Events) == 0 {
|
||||
return
|
||||
}
|
||||
urls, ats := fs.extract(ctx.Users)
|
||||
urls, ats, tokens := fs.extract(ctx.Users)
|
||||
message := BuildTplMessage(models.Feishu, fs.tpl, ctx.Events)
|
||||
for _, url := range urls {
|
||||
for i, url := range urls {
|
||||
body := feishu{
|
||||
Msgtype: "text",
|
||||
Content: feishuContent{
|
||||
@@ -77,13 +78,14 @@ func (fs *FeishuSender) Send(ctx MessageContext) {
|
||||
IsAtAll: false,
|
||||
}
|
||||
}
|
||||
doSend(url, body, models.Feishu, ctx.Stats)
|
||||
doSendAndRecord(ctx.Ctx, url, tokens[i], body, models.Feishu, ctx.Stats, ctx.Events[0])
|
||||
}
|
||||
}
|
||||
|
||||
func (fs *FeishuSender) extract(users []*models.User) ([]string, []string) {
|
||||
func (fs *FeishuSender) extract(users []*models.User) ([]string, []string, []string) {
|
||||
urls := make([]string, 0, len(users))
|
||||
ats := make([]string, 0, len(users))
|
||||
tokens := make([]string, 0, len(users))
|
||||
|
||||
for _, user := range users {
|
||||
if user.Phone != "" {
|
||||
@@ -95,7 +97,8 @@ func (fs *FeishuSender) extract(users []*models.User) ([]string, []string) {
|
||||
url = "https://open.feishu.cn/open-apis/bot/v2/hook/" + token
|
||||
}
|
||||
urls = append(urls, url)
|
||||
tokens = append(tokens, token)
|
||||
}
|
||||
}
|
||||
return urls, ats
|
||||
return urls, ats, tokens
|
||||
}
|
||||
|
||||
@@ -134,14 +134,15 @@ func (fs *FeishuCardSender) CallBack(ctx CallBackContext) {
|
||||
}
|
||||
parsedURL.RawQuery = ""
|
||||
|
||||
doSend(parsedURL.String(), body, models.FeishuCard, ctx.Stats)
|
||||
doSendAndRecord(ctx.Ctx, parsedURL.String(), parsedURL.String(), body, "callback",
|
||||
ctx.Stats, ctx.Events[0])
|
||||
}
|
||||
|
||||
func (fs *FeishuCardSender) Send(ctx MessageContext) {
|
||||
if len(ctx.Users) == 0 || len(ctx.Events) == 0 {
|
||||
return
|
||||
}
|
||||
urls, _ := fs.extract(ctx.Users)
|
||||
urls, tokens := fs.extract(ctx.Users)
|
||||
message := BuildTplMessage(models.FeishuCard, fs.tpl, ctx.Events)
|
||||
color := "red"
|
||||
lowerUnicode := strings.ToLower(message)
|
||||
@@ -156,14 +157,15 @@ func (fs *FeishuCardSender) Send(ctx MessageContext) {
|
||||
body.Card.Header.Template = color
|
||||
body.Card.Elements[0].Text.Content = message
|
||||
body.Card.Elements[2].Elements[0].Content = SendTitle
|
||||
for _, url := range urls {
|
||||
doSend(url, body, models.FeishuCard, ctx.Stats)
|
||||
for i, url := range urls {
|
||||
doSendAndRecord(ctx.Ctx, url, tokens[i], body, models.FeishuCard,
|
||||
ctx.Stats, ctx.Events[0])
|
||||
}
|
||||
}
|
||||
|
||||
func (fs *FeishuCardSender) extract(users []*models.User) ([]string, []string) {
|
||||
urls := make([]string, 0, len(users))
|
||||
ats := make([]string, 0)
|
||||
tokens := make([]string, 0, len(users))
|
||||
for i := range users {
|
||||
if token, has := users[i].ExtractToken(models.FeishuCard); has {
|
||||
url := token
|
||||
@@ -171,7 +173,8 @@ func (fs *FeishuCardSender) extract(users []*models.User) ([]string, []string) {
|
||||
url = "https://open.feishu.cn/open-apis/bot/v2/hook/" + strings.TrimSpace(token)
|
||||
}
|
||||
urls = append(urls, url)
|
||||
tokens = append(tokens, token)
|
||||
}
|
||||
}
|
||||
return urls, ats
|
||||
return urls, tokens
|
||||
}
|
||||
|
||||
@@ -77,15 +77,20 @@ func (c *IbexCallBacker) handleIbex(ctx *ctx.Context, url string, event *models.
|
||||
return
|
||||
}
|
||||
|
||||
tpl := c.taskTplCache.Get(id)
|
||||
CallIbex(ctx, id, host, c.taskTplCache, c.targetCache, c.userCache, event)
|
||||
}
|
||||
|
||||
func CallIbex(ctx *ctx.Context, id int64, host string,
|
||||
taskTplCache *memsto.TaskTplCache, targetCache *memsto.TargetCacheType,
|
||||
userCache *memsto.UserCacheType, event *models.AlertCurEvent) {
|
||||
tpl := taskTplCache.Get(id)
|
||||
if tpl == nil {
|
||||
logger.Errorf("event_callback_ibex: no such tpl(%d)", id)
|
||||
return
|
||||
}
|
||||
|
||||
// check perm
|
||||
// tpl.GroupId - host - account 三元组校验权限
|
||||
can, err := canDoIbex(tpl.UpdateBy, tpl, host, c.targetCache, c.userCache)
|
||||
can, err := canDoIbex(tpl.UpdateBy, tpl, host, targetCache, userCache)
|
||||
if err != nil {
|
||||
logger.Errorf("event_callback_ibex: check perm fail: %v", err)
|
||||
return
|
||||
|
||||
@@ -27,7 +27,8 @@ func (lk *LarkSender) CallBack(ctx CallBackContext) {
|
||||
},
|
||||
}
|
||||
|
||||
doSend(ctx.CallBackURL, body, models.Lark, ctx.Stats)
|
||||
doSendAndRecord(ctx.Ctx, ctx.CallBackURL, ctx.CallBackURL, body, "callback",
|
||||
ctx.Stats, ctx.Events[0])
|
||||
ctx.Stats.AlertNotifyTotal.WithLabelValues("rule_callback").Inc()
|
||||
}
|
||||
|
||||
|
||||
@@ -55,7 +55,8 @@ func (fs *LarkCardSender) CallBack(ctx CallBackContext) {
|
||||
}
|
||||
parsedURL.RawQuery = ""
|
||||
|
||||
doSend(parsedURL.String(), body, models.LarkCard, ctx.Stats)
|
||||
doSendAndRecord(ctx.Ctx, ctx.CallBackURL, ctx.CallBackURL, body, "callback",
|
||||
ctx.Stats, ctx.Events[0])
|
||||
}
|
||||
|
||||
func (fs *LarkCardSender) Send(ctx MessageContext) {
|
||||
|
||||
@@ -7,6 +7,7 @@ import (
|
||||
|
||||
"github.com/ccfos/nightingale/v6/alert/astats"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
|
||||
"github.com/toolkits/pkg/logger"
|
||||
)
|
||||
@@ -38,11 +39,11 @@ func (ms *MmSender) Send(ctx MessageContext) {
|
||||
}
|
||||
message := BuildTplMessage(models.Mm, ms.tpl, ctx.Events)
|
||||
|
||||
SendMM(MatterMostMessage{
|
||||
SendMM(ctx.Ctx, MatterMostMessage{
|
||||
Text: message,
|
||||
Tokens: urls,
|
||||
Stats: ctx.Stats,
|
||||
})
|
||||
}, ctx.Events[0])
|
||||
}
|
||||
|
||||
func (ms *MmSender) CallBack(ctx CallBackContext) {
|
||||
@@ -51,11 +52,11 @@ func (ms *MmSender) CallBack(ctx CallBackContext) {
|
||||
}
|
||||
message := BuildTplMessage(models.Mm, ms.tpl, ctx.Events)
|
||||
|
||||
SendMM(MatterMostMessage{
|
||||
SendMM(ctx.Ctx, MatterMostMessage{
|
||||
Text: message,
|
||||
Tokens: []string{ctx.CallBackURL},
|
||||
Stats: ctx.Stats,
|
||||
})
|
||||
}, ctx.Events[0])
|
||||
|
||||
ctx.Stats.AlertNotifyTotal.WithLabelValues("rule_callback").Inc()
|
||||
}
|
||||
@@ -70,7 +71,7 @@ func (ms *MmSender) extract(users []*models.User) []string {
|
||||
return tokens
|
||||
}
|
||||
|
||||
func SendMM(message MatterMostMessage) {
|
||||
func SendMM(ctx *ctx.Context, message MatterMostMessage, event *models.AlertCurEvent) {
|
||||
for i := 0; i < len(message.Tokens); i++ {
|
||||
u, err := url.Parse(message.Tokens[i])
|
||||
if err != nil {
|
||||
@@ -103,7 +104,7 @@ func SendMM(message MatterMostMessage) {
|
||||
Username: username,
|
||||
Text: txt + message.Text,
|
||||
}
|
||||
doSend(ur, body, models.Mm, message.Stats)
|
||||
doSendAndRecord(ctx, ur, message.Tokens[i], body, models.Mm, message.Stats, event)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -2,26 +2,30 @@ package sender
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"fmt"
|
||||
"os"
|
||||
"os/exec"
|
||||
"time"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/alert/astats"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
|
||||
"github.com/toolkits/pkg/file"
|
||||
"github.com/toolkits/pkg/logger"
|
||||
"github.com/toolkits/pkg/sys"
|
||||
)
|
||||
|
||||
func MayPluginNotify(noticeBytes []byte, notifyScript models.NotifyScript, stats *astats.Stats) {
|
||||
func MayPluginNotify(ctx *ctx.Context, noticeBytes []byte, notifyScript models.NotifyScript,
|
||||
stats *astats.Stats, event *models.AlertCurEvent) {
|
||||
if len(noticeBytes) == 0 {
|
||||
return
|
||||
}
|
||||
alertingCallScript(noticeBytes, notifyScript, stats)
|
||||
alertingCallScript(ctx, noticeBytes, notifyScript, stats, event)
|
||||
}
|
||||
|
||||
func alertingCallScript(stdinBytes []byte, notifyScript models.NotifyScript, stats *astats.Stats) {
|
||||
func alertingCallScript(ctx *ctx.Context, stdinBytes []byte, notifyScript models.NotifyScript,
|
||||
stats *astats.Stats, event *models.AlertCurEvent) {
|
||||
// not enable or no notify.py? do nothing
|
||||
config := notifyScript
|
||||
if !config.Enable || config.Content == "" {
|
||||
@@ -81,6 +85,7 @@ func alertingCallScript(stdinBytes []byte, notifyScript models.NotifyScript, sta
|
||||
}
|
||||
|
||||
err, isTimeout := sys.WrapTimeout(cmd, time.Duration(config.Timeout)*time.Second)
|
||||
NotifyRecord(ctx, event, channel, cmd.String(), "", buildErr(err, isTimeout))
|
||||
|
||||
if isTimeout {
|
||||
if err == nil {
|
||||
@@ -102,3 +107,11 @@ func alertingCallScript(stdinBytes []byte, notifyScript models.NotifyScript, sta
|
||||
|
||||
logger.Infof("event_script_notify_ok: exec %s output: %s", fpath, buf.String())
|
||||
}
|
||||
|
||||
func buildErr(err error, isTimeout bool) error {
|
||||
if err == nil && !isTimeout {
|
||||
return nil
|
||||
} else {
|
||||
return fmt.Errorf("is_timeout: %v, err: %v", isTimeout, err)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -8,6 +8,7 @@ import (
|
||||
"github.com/ccfos/nightingale/v6/alert/astats"
|
||||
"github.com/ccfos/nightingale/v6/memsto"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
)
|
||||
|
||||
type (
|
||||
@@ -22,6 +23,7 @@ type (
|
||||
Rule *models.AlertRule
|
||||
Events []*models.AlertCurEvent
|
||||
Stats *astats.Stats
|
||||
Ctx *ctx.Context
|
||||
}
|
||||
)
|
||||
|
||||
@@ -49,13 +51,15 @@ func NewSender(key string, tpls map[string]*template.Template, smtp ...aconf.SMT
|
||||
return nil
|
||||
}
|
||||
|
||||
func BuildMessageContext(rule *models.AlertRule, events []*models.AlertCurEvent, uids []int64, userCache *memsto.UserCacheType, stats *astats.Stats) MessageContext {
|
||||
func BuildMessageContext(ctx *ctx.Context, rule *models.AlertRule, events []*models.AlertCurEvent,
|
||||
uids []int64, userCache *memsto.UserCacheType, stats *astats.Stats) MessageContext {
|
||||
users := userCache.GetByUserIds(uids)
|
||||
return MessageContext{
|
||||
Rule: rule,
|
||||
Events: events,
|
||||
Users: users,
|
||||
Stats: stats,
|
||||
Ctx: ctx,
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -6,6 +6,7 @@ import (
|
||||
|
||||
"github.com/ccfos/nightingale/v6/alert/astats"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
|
||||
"github.com/toolkits/pkg/logger"
|
||||
)
|
||||
@@ -35,11 +36,11 @@ func (ts *TelegramSender) CallBack(ctx CallBackContext) {
|
||||
}
|
||||
|
||||
message := BuildTplMessage(models.Telegram, ts.tpl, ctx.Events)
|
||||
SendTelegram(TelegramMessage{
|
||||
SendTelegram(ctx.Ctx, TelegramMessage{
|
||||
Text: message,
|
||||
Tokens: []string{ctx.CallBackURL},
|
||||
Stats: ctx.Stats,
|
||||
})
|
||||
}, ctx.Events[0])
|
||||
|
||||
ctx.Stats.AlertNotifyTotal.WithLabelValues("rule_callback").Inc()
|
||||
}
|
||||
@@ -51,11 +52,11 @@ func (ts *TelegramSender) Send(ctx MessageContext) {
|
||||
tokens := ts.extract(ctx.Users)
|
||||
message := BuildTplMessage(models.Telegram, ts.tpl, ctx.Events)
|
||||
|
||||
SendTelegram(TelegramMessage{
|
||||
SendTelegram(ctx.Ctx, TelegramMessage{
|
||||
Text: message,
|
||||
Tokens: tokens,
|
||||
Stats: ctx.Stats,
|
||||
})
|
||||
}, ctx.Events[0])
|
||||
}
|
||||
|
||||
func (ts *TelegramSender) extract(users []*models.User) []string {
|
||||
@@ -68,7 +69,7 @@ func (ts *TelegramSender) extract(users []*models.User) []string {
|
||||
return tokens
|
||||
}
|
||||
|
||||
func SendTelegram(message TelegramMessage) {
|
||||
func SendTelegram(ctx *ctx.Context, message TelegramMessage, event *models.AlertCurEvent) {
|
||||
for i := 0; i < len(message.Tokens); i++ {
|
||||
if !strings.Contains(message.Tokens[i], "/") && !strings.HasPrefix(message.Tokens[i], "https://") {
|
||||
logger.Errorf("telegram_sender: result=fail invalid token=%s", message.Tokens[i])
|
||||
@@ -92,6 +93,6 @@ func SendTelegram(message TelegramMessage) {
|
||||
Text: message.Text,
|
||||
}
|
||||
|
||||
doSend(url, body, models.Telegram, message.Stats)
|
||||
doSendAndRecord(ctx, url, message.Tokens[i], body, models.Telegram, message.Stats, event)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -4,33 +4,41 @@ import (
|
||||
"bytes"
|
||||
"crypto/tls"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"io"
|
||||
"net/http"
|
||||
"sync"
|
||||
"time"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/alert/astats"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
|
||||
"github.com/toolkits/pkg/logger"
|
||||
)
|
||||
|
||||
func sendWebhook(webhook *models.Webhook, event *models.AlertCurEvent, stats *astats.Stats) bool {
|
||||
func sendWebhook(webhook *models.Webhook, event interface{}, stats *astats.Stats) (bool, string, error) {
|
||||
channel := "webhook"
|
||||
if webhook.Type == models.RuleCallback {
|
||||
channel = "callback"
|
||||
}
|
||||
|
||||
conf := webhook
|
||||
if conf.Url == "" || !conf.Enable {
|
||||
return false
|
||||
return false, "", nil
|
||||
}
|
||||
bs, err := json.Marshal(event)
|
||||
if err != nil {
|
||||
logger.Errorf("alertingWebhook failed to marshal event:%+v err:%v", event, err)
|
||||
return false
|
||||
logger.Errorf("%s alertingWebhook failed to marshal event:%+v err:%v", channel, event, err)
|
||||
return false, "", err
|
||||
}
|
||||
|
||||
bf := bytes.NewBuffer(bs)
|
||||
|
||||
req, err := http.NewRequest("POST", conf.Url, bf)
|
||||
if err != nil {
|
||||
logger.Warningf("alertingWebhook failed to new reques event:%+v err:%v", event, err)
|
||||
return true
|
||||
logger.Warningf("%s alertingWebhook failed to new reques event:%s err:%v", channel, string(bs), err)
|
||||
return true, "", err
|
||||
}
|
||||
|
||||
req.Header.Set("Content-Type", "application/json")
|
||||
@@ -58,35 +66,37 @@ func sendWebhook(webhook *models.Webhook, event *models.AlertCurEvent, stats *as
|
||||
},
|
||||
}
|
||||
|
||||
stats.AlertNotifyTotal.WithLabelValues("webhook").Inc()
|
||||
stats.AlertNotifyTotal.WithLabelValues(channel).Inc()
|
||||
var resp *http.Response
|
||||
var body []byte
|
||||
resp, err = client.Do(req)
|
||||
|
||||
if err != nil {
|
||||
stats.AlertNotifyErrorTotal.WithLabelValues("webhook").Inc()
|
||||
logger.Errorf("event_webhook_fail, ruleId: [%d], eventId: [%d], event:%+v, url: [%s], error: [%s]", event.RuleId, event.Id, event, conf.Url, err)
|
||||
return true
|
||||
stats.AlertNotifyErrorTotal.WithLabelValues(channel).Inc()
|
||||
logger.Errorf("event_%s_fail, event:%s, url: [%s], error: [%s]", channel, string(bs), conf.Url, err)
|
||||
return true, "", err
|
||||
}
|
||||
|
||||
var body []byte
|
||||
if resp.Body != nil {
|
||||
defer resp.Body.Close()
|
||||
body, _ = io.ReadAll(resp.Body)
|
||||
}
|
||||
|
||||
if resp.StatusCode == 429 {
|
||||
logger.Errorf("event_webhook_fail, url: %s, response code: %d, body: %s event:%+v", conf.Url, resp.StatusCode, string(body), event)
|
||||
return true
|
||||
logger.Errorf("event_%s_fail, url: %s, response code: %d, body: %s event:%s", channel, conf.Url, resp.StatusCode, string(body), string(bs))
|
||||
return true, string(body), fmt.Errorf("status code is 429")
|
||||
}
|
||||
|
||||
logger.Debugf("event_webhook_succ, url: %s, response code: %d, body: %s event:%+v", conf.Url, resp.StatusCode, string(body), event)
|
||||
return false
|
||||
logger.Debugf("event_%s_succ, url: %s, response code: %d, body: %s event:%s", channel, conf.Url, resp.StatusCode, string(body), string(bs))
|
||||
return false, string(body), nil
|
||||
}
|
||||
|
||||
func SendWebhooks(webhooks []*models.Webhook, event *models.AlertCurEvent, stats *astats.Stats) {
|
||||
func SingleSendWebhooks(ctx *ctx.Context, webhooks []*models.Webhook, event *models.AlertCurEvent, stats *astats.Stats) {
|
||||
for _, conf := range webhooks {
|
||||
retryCount := 0
|
||||
for retryCount < 3 {
|
||||
needRetry := sendWebhook(conf, event, stats)
|
||||
needRetry, res, err := sendWebhook(conf, event, stats)
|
||||
NotifyRecord(ctx, event, "webhook", conf.Url, res, err)
|
||||
if !needRetry {
|
||||
break
|
||||
}
|
||||
@@ -95,3 +105,81 @@ func SendWebhooks(webhooks []*models.Webhook, event *models.AlertCurEvent, stats
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func BatchSendWebhooks(ctx *ctx.Context, webhooks []*models.Webhook, event *models.AlertCurEvent, stats *astats.Stats) {
|
||||
for _, conf := range webhooks {
|
||||
logger.Infof("push event:%+v to queue:%v", event, conf)
|
||||
PushEvent(ctx, conf, event, stats)
|
||||
}
|
||||
}
|
||||
|
||||
var EventQueue = make(map[string]*WebhookQueue)
|
||||
var CallbackEventQueue = make(map[string]*WebhookQueue)
|
||||
var CallbackEventQueueLock sync.RWMutex
|
||||
var EventQueueLock sync.RWMutex
|
||||
|
||||
const QueueMaxSize = 100000
|
||||
|
||||
type WebhookQueue struct {
|
||||
list *SafeListLimited
|
||||
closeCh chan struct{}
|
||||
}
|
||||
|
||||
func PushEvent(ctx *ctx.Context, webhook *models.Webhook, event *models.AlertCurEvent, stats *astats.Stats) {
|
||||
EventQueueLock.RLock()
|
||||
queue := EventQueue[webhook.Url]
|
||||
EventQueueLock.RUnlock()
|
||||
|
||||
if queue == nil {
|
||||
queue = &WebhookQueue{
|
||||
list: NewSafeListLimited(QueueMaxSize),
|
||||
closeCh: make(chan struct{}),
|
||||
}
|
||||
|
||||
EventQueueLock.Lock()
|
||||
EventQueue[webhook.Url] = queue
|
||||
EventQueueLock.Unlock()
|
||||
|
||||
StartConsumer(ctx, queue, webhook.Batch, webhook, stats)
|
||||
}
|
||||
|
||||
succ := queue.list.PushFront(event)
|
||||
if !succ {
|
||||
stats.AlertNotifyErrorTotal.WithLabelValues("push_event_queue").Inc()
|
||||
logger.Warningf("Write channel(%s) full, current channel size: %d event:%v", webhook.Url, queue.list.Len(), event)
|
||||
}
|
||||
}
|
||||
|
||||
func StartConsumer(ctx *ctx.Context, queue *WebhookQueue, popSize int, webhook *models.Webhook, stats *astats.Stats) {
|
||||
for {
|
||||
select {
|
||||
case <-queue.closeCh:
|
||||
logger.Infof("event queue:%v closed", queue)
|
||||
return
|
||||
default:
|
||||
events := queue.list.PopBack(popSize)
|
||||
if len(events) == 0 {
|
||||
time.Sleep(time.Millisecond * 400)
|
||||
continue
|
||||
}
|
||||
|
||||
retryCount := 0
|
||||
for retryCount < webhook.RetryCount {
|
||||
needRetry, res, err := sendWebhook(webhook, events, stats)
|
||||
if !needRetry {
|
||||
break
|
||||
}
|
||||
go RecordEvents(ctx, webhook, events, stats, res, err)
|
||||
retryCount++
|
||||
time.Sleep(time.Second * time.Duration(webhook.RetryInterval) * time.Duration(retryCount))
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func RecordEvents(ctx *ctx.Context, webhook *models.Webhook, events []*models.AlertCurEvent, stats *astats.Stats, res string, err error) {
|
||||
for _, event := range events {
|
||||
time.Sleep(time.Millisecond * 10)
|
||||
NotifyRecord(ctx, event, "webhook", webhook.Url, res, err)
|
||||
}
|
||||
}
|
||||
|
||||
111
alert/sender/webhook_queue.go
Normal file
111
alert/sender/webhook_queue.go
Normal file
@@ -0,0 +1,111 @@
|
||||
package sender
|
||||
|
||||
import (
|
||||
"container/list"
|
||||
"sync"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
)
|
||||
|
||||
type SafeList struct {
|
||||
sync.RWMutex
|
||||
L *list.List
|
||||
}
|
||||
|
||||
func NewSafeList() *SafeList {
|
||||
return &SafeList{L: list.New()}
|
||||
}
|
||||
|
||||
func (sl *SafeList) PushFront(v interface{}) *list.Element {
|
||||
sl.Lock()
|
||||
e := sl.L.PushFront(v)
|
||||
sl.Unlock()
|
||||
return e
|
||||
}
|
||||
|
||||
func (sl *SafeList) PushFrontBatch(vs []interface{}) {
|
||||
sl.Lock()
|
||||
for _, item := range vs {
|
||||
sl.L.PushFront(item)
|
||||
}
|
||||
sl.Unlock()
|
||||
}
|
||||
|
||||
func (sl *SafeList) PopBack(max int) []*models.AlertCurEvent {
|
||||
sl.Lock()
|
||||
|
||||
count := sl.L.Len()
|
||||
if count == 0 {
|
||||
sl.Unlock()
|
||||
return []*models.AlertCurEvent{}
|
||||
}
|
||||
|
||||
if count > max {
|
||||
count = max
|
||||
}
|
||||
|
||||
items := make([]*models.AlertCurEvent, 0, count)
|
||||
for i := 0; i < count; i++ {
|
||||
item := sl.L.Remove(sl.L.Back())
|
||||
sample, ok := item.(*models.AlertCurEvent)
|
||||
if ok {
|
||||
items = append(items, sample)
|
||||
}
|
||||
}
|
||||
|
||||
sl.Unlock()
|
||||
return items
|
||||
}
|
||||
|
||||
func (sl *SafeList) RemoveAll() {
|
||||
sl.Lock()
|
||||
sl.L.Init()
|
||||
sl.Unlock()
|
||||
}
|
||||
|
||||
func (sl *SafeList) Len() int {
|
||||
sl.RLock()
|
||||
size := sl.L.Len()
|
||||
sl.RUnlock()
|
||||
return size
|
||||
}
|
||||
|
||||
// SafeList with Limited Size
|
||||
type SafeListLimited struct {
|
||||
maxSize int
|
||||
SL *SafeList
|
||||
}
|
||||
|
||||
func NewSafeListLimited(maxSize int) *SafeListLimited {
|
||||
return &SafeListLimited{SL: NewSafeList(), maxSize: maxSize}
|
||||
}
|
||||
|
||||
func (sll *SafeListLimited) PopBack(max int) []*models.AlertCurEvent {
|
||||
return sll.SL.PopBack(max)
|
||||
}
|
||||
|
||||
func (sll *SafeListLimited) PushFront(v interface{}) bool {
|
||||
if sll.SL.Len() >= sll.maxSize {
|
||||
return false
|
||||
}
|
||||
|
||||
sll.SL.PushFront(v)
|
||||
return true
|
||||
}
|
||||
|
||||
func (sll *SafeListLimited) PushFrontBatch(vs []interface{}) bool {
|
||||
if sll.SL.Len() >= sll.maxSize {
|
||||
return false
|
||||
}
|
||||
|
||||
sll.SL.PushFrontBatch(vs)
|
||||
return true
|
||||
}
|
||||
|
||||
func (sll *SafeListLimited) RemoveAll() {
|
||||
sll.SL.RemoveAll()
|
||||
}
|
||||
|
||||
func (sll *SafeListLimited) Len() int {
|
||||
return sll.SL.Len()
|
||||
}
|
||||
@@ -37,7 +37,8 @@ func (ws *WecomSender) CallBack(ctx CallBackContext) {
|
||||
},
|
||||
}
|
||||
|
||||
doSend(ctx.CallBackURL, body, models.Wecom, ctx.Stats)
|
||||
doSendAndRecord(ctx.Ctx, ctx.CallBackURL, ctx.CallBackURL, body, "callback",
|
||||
ctx.Stats, ctx.Events[0])
|
||||
ctx.Stats.AlertNotifyTotal.WithLabelValues("rule_callback").Inc()
|
||||
}
|
||||
|
||||
@@ -45,21 +46,22 @@ func (ws *WecomSender) Send(ctx MessageContext) {
|
||||
if len(ctx.Users) == 0 || len(ctx.Events) == 0 {
|
||||
return
|
||||
}
|
||||
urls := ws.extract(ctx.Users)
|
||||
urls, tokens := ws.extract(ctx.Users)
|
||||
message := BuildTplMessage(models.Wecom, ws.tpl, ctx.Events)
|
||||
for _, url := range urls {
|
||||
for i, url := range urls {
|
||||
body := wecom{
|
||||
Msgtype: "markdown",
|
||||
Markdown: wecomMarkdown{
|
||||
Content: message,
|
||||
},
|
||||
}
|
||||
doSend(url, body, models.Wecom, ctx.Stats)
|
||||
doSendAndRecord(ctx.Ctx, url, tokens[i], body, models.Wecom, ctx.Stats, ctx.Events[0])
|
||||
}
|
||||
}
|
||||
|
||||
func (ws *WecomSender) extract(users []*models.User) []string {
|
||||
func (ws *WecomSender) extract(users []*models.User) ([]string, []string) {
|
||||
urls := make([]string, 0, len(users))
|
||||
tokens := make([]string, 0, len(users))
|
||||
for _, user := range users {
|
||||
if token, has := user.ExtractToken(models.Wecom); has {
|
||||
url := token
|
||||
@@ -67,7 +69,8 @@ func (ws *WecomSender) extract(users []*models.User) []string {
|
||||
url = "https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=" + token
|
||||
}
|
||||
urls = append(urls, url)
|
||||
tokens = append(tokens, token)
|
||||
}
|
||||
}
|
||||
return urls
|
||||
return urls, tokens
|
||||
}
|
||||
|
||||
@@ -13,6 +13,7 @@ type Center struct {
|
||||
UseFileAssets bool
|
||||
FlashDuty FlashDuty
|
||||
EventHistoryGroupView bool
|
||||
CleanNotifyRecordDay int
|
||||
}
|
||||
|
||||
type Plugin struct {
|
||||
|
||||
@@ -16,6 +16,7 @@ import (
|
||||
centerrt "github.com/ccfos/nightingale/v6/center/router"
|
||||
"github.com/ccfos/nightingale/v6/center/sso"
|
||||
"github.com/ccfos/nightingale/v6/conf"
|
||||
"github.com/ccfos/nightingale/v6/cron"
|
||||
"github.com/ccfos/nightingale/v6/dumper"
|
||||
"github.com/ccfos/nightingale/v6/memsto"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
@@ -106,6 +107,8 @@ func Initialize(configDir string, cryptoKey string) (func(), error) {
|
||||
|
||||
go version.GetGithubVersion()
|
||||
|
||||
go cron.CleanNotifyRecord(ctx, config.Center.CleanNotifyRecordDay)
|
||||
|
||||
alertrtRouter := alertrt.New(config.HTTP, config.Alert, alertMuteCache, targetCache, busiGroupCache, alertStats, ctx, externalProcessors)
|
||||
centerRouter := centerrt.New(config.HTTP, config.Center, config.Alert, config.Ibex, cconf.Operations, dsCache, notifyConfigCache, promClients, tdengineClients,
|
||||
redis, sso, ctx, metas, idents, targetCache, userCache, userGroupCache)
|
||||
|
||||
@@ -315,6 +315,8 @@ func (rt *Router) Config(r *gin.Engine) {
|
||||
pages.GET("/busi-group/:id/alert-rules", rt.auth(), rt.user(), rt.perm("/alert-rules"), rt.alertRuleGets)
|
||||
pages.POST("/busi-group/:id/alert-rules", rt.auth(), rt.user(), rt.perm("/alert-rules/add"), rt.bgrw(), rt.alertRuleAddByFE)
|
||||
pages.POST("/busi-group/:id/alert-rules/import", rt.auth(), rt.user(), rt.perm("/alert-rules/add"), rt.bgrw(), rt.alertRuleAddByImport)
|
||||
pages.POST("/busi-group/:id/alert-rules/import-prom-rule", rt.auth(),
|
||||
rt.user(), rt.perm("/alert-rules/add"), rt.bgrw(), rt.alertRuleAddByImportPromRule)
|
||||
pages.DELETE("/busi-group/:id/alert-rules", rt.auth(), rt.user(), rt.perm("/alert-rules/del"), rt.bgrw(), rt.alertRuleDel)
|
||||
pages.PUT("/busi-group/:id/alert-rules/fields", rt.auth(), rt.user(), rt.perm("/alert-rules/put"), rt.bgrw(), rt.alertRulePutFields)
|
||||
pages.PUT("/busi-group/:id/alert-rule/:arid", rt.auth(), rt.user(), rt.perm("/alert-rules/put"), rt.alertRulePutByFE)
|
||||
@@ -322,6 +324,7 @@ func (rt *Router) Config(r *gin.Engine) {
|
||||
pages.GET("/alert-rule/:arid/pure", rt.auth(), rt.user(), rt.perm("/alert-rules"), rt.alertRulePureGet)
|
||||
pages.PUT("/busi-group/alert-rule/validate", rt.auth(), rt.user(), rt.perm("/alert-rules/put"), rt.alertRuleValidation)
|
||||
pages.POST("/relabel-test", rt.auth(), rt.user(), rt.relabelTest)
|
||||
pages.POST("/busi-group/:id/alert-rules/clone", rt.auth(), rt.user(), rt.perm("/alert-rules/post"), rt.bgrw(), rt.cloneToMachine)
|
||||
|
||||
pages.GET("/busi-groups/recording-rules", rt.auth(), rt.user(), rt.perm("/recording-rules"), rt.recordingRuleGetsByGids)
|
||||
pages.GET("/busi-group/:id/recording-rules", rt.auth(), rt.user(), rt.perm("/recording-rules"), rt.recordingRuleGets)
|
||||
@@ -350,9 +353,11 @@ func (rt *Router) Config(r *gin.Engine) {
|
||||
if rt.Center.AnonymousAccess.AlertDetail {
|
||||
pages.GET("/alert-cur-event/:eid", rt.alertCurEventGet)
|
||||
pages.GET("/alert-his-event/:eid", rt.alertHisEventGet)
|
||||
pages.GET("/event-notify-records/:eid", rt.notificationRecordList)
|
||||
} else {
|
||||
pages.GET("/alert-cur-event/:eid", rt.auth(), rt.user(), rt.alertCurEventGet)
|
||||
pages.GET("/alert-his-event/:eid", rt.auth(), rt.user(), rt.alertHisEventGet)
|
||||
pages.GET("/event-notify-records/:eid", rt.auth(), rt.user(), rt.notificationRecordList)
|
||||
}
|
||||
|
||||
// card logic
|
||||
@@ -551,6 +556,9 @@ func (rt *Router) Config(r *gin.Engine) {
|
||||
|
||||
service.GET("/targets-of-alert-rule", rt.targetsOfAlertRule)
|
||||
|
||||
service.POST("/notify-record", rt.notificationRecordAdd)
|
||||
|
||||
service.GET("/alert-cur-events-del-by-hash", rt.alertCurEventDelByHash)
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -222,6 +222,11 @@ func (rt *Router) alertCurEventGet(c *gin.Context) {
|
||||
rt.bgroCheck(c, event.GroupId)
|
||||
}
|
||||
|
||||
ruleConfig, needReset := models.FillRuleConfigTplName(rt.Ctx, event.RuleConfig)
|
||||
if needReset {
|
||||
event.RuleConfigJson = ruleConfig
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Data(event, nil)
|
||||
}
|
||||
|
||||
@@ -229,3 +234,8 @@ func (rt *Router) alertCurEventsStatistics(c *gin.Context) {
|
||||
|
||||
ginx.NewRender(c).Data(models.AlertCurEventStatistics(rt.Ctx, time.Now()), nil)
|
||||
}
|
||||
|
||||
func (rt *Router) alertCurEventDelByHash(c *gin.Context) {
|
||||
hash := ginx.QueryStr(c, "hash")
|
||||
ginx.NewRender(c).Message(models.AlertCurEventDelByHash(rt.Ctx, hash))
|
||||
}
|
||||
|
||||
@@ -87,6 +87,11 @@ func (rt *Router) alertHisEventGet(c *gin.Context) {
|
||||
rt.bgroCheck(c, event.GroupId)
|
||||
}
|
||||
|
||||
ruleConfig, needReset := models.FillRuleConfigTplName(rt.Ctx, event.RuleConfig)
|
||||
if needReset {
|
||||
event.RuleConfigJson = ruleConfig
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Data(event, err)
|
||||
}
|
||||
|
||||
|
||||
@@ -1,8 +1,10 @@
|
||||
package router
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"net/http"
|
||||
"regexp"
|
||||
"strconv"
|
||||
"strings"
|
||||
"time"
|
||||
@@ -12,6 +14,7 @@ import (
|
||||
"github.com/ccfos/nightingale/v6/pushgw/writer"
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
"github.com/jinzhu/copier"
|
||||
"github.com/prometheus/prometheus/prompb"
|
||||
"github.com/toolkits/pkg/ginx"
|
||||
"github.com/toolkits/pkg/i18n"
|
||||
@@ -125,6 +128,25 @@ func (rt *Router) alertRuleAddByImport(c *gin.Context) {
|
||||
ginx.NewRender(c).Data(reterr, nil)
|
||||
}
|
||||
|
||||
func (rt *Router) alertRuleAddByImportPromRule(c *gin.Context) {
|
||||
username := c.MustGet("username").(string)
|
||||
|
||||
type PromRule struct {
|
||||
Groups []models.PromRuleGroup `yaml:"groups"`
|
||||
}
|
||||
var pr PromRule
|
||||
ginx.Dangerous(c.BindYAML(&pr))
|
||||
if len(pr.Groups) == 0 {
|
||||
ginx.Bomb(http.StatusBadRequest, "input yaml is empty")
|
||||
}
|
||||
|
||||
lst := models.DealPromGroup(pr.Groups)
|
||||
bgid := ginx.UrlParamInt64(c, "id")
|
||||
err := rt.alertRuleAdd(lst, username, bgid, c.GetHeader("X-Language"))
|
||||
|
||||
ginx.NewRender(c).Data(err, nil)
|
||||
}
|
||||
|
||||
func (rt *Router) alertRuleAddByService(c *gin.Context) {
|
||||
var lst []models.AlertRule
|
||||
ginx.BindJSON(c, &lst)
|
||||
@@ -275,6 +297,43 @@ func (rt *Router) alertRulePutFields(c *gin.Context) {
|
||||
continue
|
||||
}
|
||||
|
||||
if f.Action == "update_triggers" {
|
||||
if triggers, has := f.Fields["triggers"]; has {
|
||||
originRule := ar.RuleConfigJson.(map[string]interface{})
|
||||
originRule["triggers"] = triggers
|
||||
b, err := json.Marshal(originRule)
|
||||
ginx.Dangerous(err)
|
||||
ginx.Dangerous(ar.UpdateFieldsMap(rt.Ctx, map[string]interface{}{"rule_config": string(b)}))
|
||||
continue
|
||||
}
|
||||
}
|
||||
|
||||
if f.Action == "annotations_add" {
|
||||
if annotations, has := f.Fields["annotations"]; has {
|
||||
annotationsMap := annotations.(map[string]interface{})
|
||||
for k, v := range annotationsMap {
|
||||
ar.AnnotationsJSON[k] = v.(string)
|
||||
}
|
||||
b, err := json.Marshal(ar.AnnotationsJSON)
|
||||
ginx.Dangerous(err)
|
||||
ginx.Dangerous(ar.UpdateFieldsMap(rt.Ctx, map[string]interface{}{"annotations": string(b)}))
|
||||
continue
|
||||
}
|
||||
}
|
||||
|
||||
if f.Action == "annotations_del" {
|
||||
if annotations, has := f.Fields["annotations"]; has {
|
||||
annotationsKeys := annotations.(map[string]interface{})
|
||||
for key := range annotationsKeys {
|
||||
delete(ar.AnnotationsJSON, key)
|
||||
}
|
||||
b, err := json.Marshal(ar.AnnotationsJSON)
|
||||
ginx.Dangerous(err)
|
||||
ginx.Dangerous(ar.UpdateFieldsMap(rt.Ctx, map[string]interface{}{"annotations": string(b)}))
|
||||
continue
|
||||
}
|
||||
}
|
||||
|
||||
if f.Action == "callback_add" {
|
||||
// 增加一个 callback 地址
|
||||
if callbacks, has := f.Fields["callbacks"]; has {
|
||||
@@ -453,3 +512,84 @@ func (rt *Router) relabelTest(c *gin.Context) {
|
||||
|
||||
ginx.NewRender(c).Data(tags, nil)
|
||||
}
|
||||
|
||||
type identListForm struct {
|
||||
Ids []int64 `json:"ids"`
|
||||
IdentList []string `json:"ident_list"`
|
||||
}
|
||||
|
||||
func (rt *Router) cloneToMachine(c *gin.Context) {
|
||||
var f identListForm
|
||||
ginx.BindJSON(c, &f)
|
||||
|
||||
if len(f.IdentList) == 0 {
|
||||
ginx.Bomb(http.StatusBadRequest, "ident_list is empty")
|
||||
}
|
||||
|
||||
alertRules, err := models.AlertRuleGetsByIds(rt.Ctx, f.Ids)
|
||||
ginx.Dangerous(err)
|
||||
|
||||
re := regexp.MustCompile("ident = \".*?\"")
|
||||
|
||||
user := c.MustGet("username").(string)
|
||||
now := time.Now().Unix()
|
||||
|
||||
newRules := make([]*models.AlertRule, 0)
|
||||
|
||||
reterr := make(map[string]map[string]string)
|
||||
|
||||
for i := range alertRules {
|
||||
reterr[alertRules[i].Name] = make(map[string]string)
|
||||
|
||||
if alertRules[i].Cate != "prometheus" {
|
||||
reterr[alertRules[i].Name]["all"] = "Only Prometheus rules can be cloned to machines"
|
||||
continue
|
||||
}
|
||||
var ruleConfig *models.PromRuleConfig
|
||||
if err := json.Unmarshal([]byte(alertRules[i].RuleConfig), &ruleConfig); err != nil {
|
||||
reterr[alertRules[i].Name]["all"] = "invalid rule, fail to unmarshal rule config"
|
||||
continue
|
||||
}
|
||||
|
||||
for j := range f.IdentList {
|
||||
|
||||
for q := range ruleConfig.Queries {
|
||||
ruleConfig.Queries[q].PromQl = re.ReplaceAllString(ruleConfig.Queries[q].PromQl, fmt.Sprintf("ident = \"%s\"", f.IdentList[j]))
|
||||
}
|
||||
|
||||
configJson, err := json.Marshal(ruleConfig)
|
||||
if err != nil {
|
||||
reterr[alertRules[i].Name][f.IdentList[j]] = fmt.Sprintf("invalid rule, fail to marshal rule config, err: %s", err)
|
||||
continue
|
||||
}
|
||||
|
||||
newRule := &models.AlertRule{}
|
||||
if err := copier.Copy(newRule, alertRules[i]); err != nil {
|
||||
reterr[alertRules[i].Name][f.IdentList[j]] = fmt.Sprintf("fail to clone rule, err: %s", err)
|
||||
continue
|
||||
}
|
||||
newRule.Id = 0
|
||||
newRule.Name = alertRules[i].Name + "_" + f.IdentList[j]
|
||||
newRule.CreateBy = user
|
||||
newRule.UpdateBy = user
|
||||
newRule.UpdateAt = now
|
||||
newRule.CreateAt = now
|
||||
newRule.RuleConfig = string(configJson)
|
||||
|
||||
exist, err := models.AlertRuleExists(rt.Ctx, 0, newRule.GroupId, newRule.DatasourceIdsJson, newRule.Name)
|
||||
if err != nil {
|
||||
reterr[alertRules[i].Name][f.IdentList[j]] = err.Error()
|
||||
continue
|
||||
}
|
||||
|
||||
if exist {
|
||||
reterr[alertRules[i].Name][f.IdentList[j]] = fmt.Sprintf("rule already exists, ruleName: %s", newRule.Name)
|
||||
continue
|
||||
}
|
||||
|
||||
newRules = append(newRules, newRule)
|
||||
}
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Data(reterr, models.InsertAlertRule(rt.Ctx, newRules))
|
||||
}
|
||||
|
||||
@@ -3,7 +3,7 @@ package router
|
||||
import (
|
||||
"compress/gzip"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"errors"
|
||||
"io/ioutil"
|
||||
"strings"
|
||||
"time"
|
||||
@@ -57,7 +57,7 @@ func HandleHeartbeat(c *gin.Context, ctx *ctx.Context, engineName string, metaSe
|
||||
}
|
||||
|
||||
if req.Hostname == "" {
|
||||
return req, fmt.Errorf("hostname is required", 400)
|
||||
return req, errors.New("hostname is required")
|
||||
}
|
||||
|
||||
// maybe from pushgw
|
||||
@@ -121,6 +121,10 @@ func HandleHeartbeat(c *gin.Context, ctx *ctx.Context, engineName string, metaSe
|
||||
field["agent_version"] = req.AgentVersion
|
||||
}
|
||||
|
||||
if req.OS != "" && req.OS != target.OS {
|
||||
field["os"] = req.OS
|
||||
}
|
||||
|
||||
if len(field) > 0 {
|
||||
err := target.UpdateFieldsMap(ctx, field)
|
||||
if err != nil {
|
||||
|
||||
205
center/router/router_notification_record.go
Normal file
205
center/router/router_notification_record.go
Normal file
@@ -0,0 +1,205 @@
|
||||
package router
|
||||
|
||||
import (
|
||||
"strings"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
"github.com/toolkits/pkg/ginx"
|
||||
"github.com/toolkits/pkg/logger"
|
||||
)
|
||||
|
||||
type NotificationResponse struct {
|
||||
SubRules []SubRule `json:"sub_rules"`
|
||||
Notifies map[string][]Record `json:"notifies"`
|
||||
}
|
||||
|
||||
type SubRule struct {
|
||||
SubID int64 `json:"sub_id"`
|
||||
Notifies map[string][]Record `json:"notifies"`
|
||||
}
|
||||
|
||||
type Notify struct {
|
||||
Channel string `json:"channel"`
|
||||
Records []Record `json:"records"`
|
||||
}
|
||||
|
||||
type Record struct {
|
||||
Target string `json:"target"`
|
||||
Username string `json:"username"`
|
||||
Status int `json:"status"`
|
||||
Detail string `json:"detail"`
|
||||
}
|
||||
|
||||
// notificationRecordAdd
|
||||
func (rt *Router) notificationRecordAdd(c *gin.Context) {
|
||||
var req models.NotificaitonRecord
|
||||
ginx.BindJSON(c, &req)
|
||||
err := req.Add(rt.Ctx)
|
||||
|
||||
ginx.NewRender(c).Data(req.Id, err)
|
||||
}
|
||||
|
||||
func (rt *Router) notificationRecordList(c *gin.Context) {
|
||||
eid := ginx.UrlParamInt64(c, "eid")
|
||||
lst, err := models.NotificaitonRecordsGetByEventId(rt.Ctx, eid)
|
||||
ginx.Dangerous(err)
|
||||
|
||||
response := buildNotificationResponse(rt.Ctx, lst)
|
||||
ginx.NewRender(c).Data(response, nil)
|
||||
}
|
||||
|
||||
func buildNotificationResponse(ctx *ctx.Context, nl []*models.NotificaitonRecord) NotificationResponse {
|
||||
response := NotificationResponse{
|
||||
SubRules: []SubRule{},
|
||||
Notifies: make(map[string][]Record),
|
||||
}
|
||||
|
||||
subRuleMap := make(map[int64]*SubRule)
|
||||
|
||||
// Collect all group IDs
|
||||
groupIdSet := make(map[int64]struct{})
|
||||
|
||||
// map[SubId]map[Channel]map[Target]index
|
||||
filter := make(map[int64]map[string]map[string]int)
|
||||
|
||||
for i, n := range nl {
|
||||
// 对相同的 channel-target 进行合并
|
||||
for _, gid := range n.GetGroupIds(ctx) {
|
||||
groupIdSet[gid] = struct{}{}
|
||||
}
|
||||
|
||||
if _, exists := filter[n.SubId]; !exists {
|
||||
filter[n.SubId] = make(map[string]map[string]int)
|
||||
}
|
||||
|
||||
if _, exists := filter[n.SubId][n.Channel]; !exists {
|
||||
filter[n.SubId][n.Channel] = make(map[string]int)
|
||||
}
|
||||
|
||||
idx, exists := filter[n.SubId][n.Channel][n.Target]
|
||||
if !exists {
|
||||
filter[n.SubId][n.Channel][n.Target] = i
|
||||
} else {
|
||||
if nl[idx].Status < n.Status {
|
||||
nl[idx].Status = n.Status
|
||||
}
|
||||
nl[idx].Details = nl[idx].Details + ", " + n.Details
|
||||
nl[i] = nil
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
// Fill usernames only once
|
||||
usernameByTarget := fillUserNames(ctx, groupIdSet)
|
||||
|
||||
for _, n := range nl {
|
||||
if n == nil {
|
||||
continue
|
||||
}
|
||||
|
||||
m := usernameByTarget[n.Target]
|
||||
usernames := make([]string, 0, len(m))
|
||||
for k := range m {
|
||||
usernames = append(usernames, k)
|
||||
}
|
||||
|
||||
if !checkChannel(n.Channel) {
|
||||
// Hide sensitive information
|
||||
n.Target = replaceLastEightChars(n.Target)
|
||||
}
|
||||
record := Record{
|
||||
Target: n.Target,
|
||||
Status: n.Status,
|
||||
Detail: n.Details,
|
||||
}
|
||||
|
||||
record.Username = strings.Join(usernames, ",")
|
||||
|
||||
if n.SubId > 0 {
|
||||
// Handle SubRules
|
||||
subRule, ok := subRuleMap[n.SubId]
|
||||
if !ok {
|
||||
newSubRule := &SubRule{
|
||||
SubID: n.SubId,
|
||||
}
|
||||
newSubRule.Notifies = make(map[string][]Record)
|
||||
newSubRule.Notifies[n.Channel] = []Record{record}
|
||||
|
||||
subRuleMap[n.SubId] = newSubRule
|
||||
} else {
|
||||
if _, exists := subRule.Notifies[n.Channel]; !exists {
|
||||
|
||||
subRule.Notifies[n.Channel] = []Record{record}
|
||||
} else {
|
||||
subRule.Notifies[n.Channel] = append(subRule.Notifies[n.Channel], record)
|
||||
}
|
||||
}
|
||||
continue
|
||||
}
|
||||
|
||||
if response.Notifies == nil {
|
||||
response.Notifies = make(map[string][]Record)
|
||||
}
|
||||
|
||||
if _, exists := response.Notifies[n.Channel]; !exists {
|
||||
response.Notifies[n.Channel] = []Record{record}
|
||||
} else {
|
||||
response.Notifies[n.Channel] = append(response.Notifies[n.Channel], record)
|
||||
}
|
||||
}
|
||||
|
||||
for _, subRule := range subRuleMap {
|
||||
response.SubRules = append(response.SubRules, *subRule)
|
||||
}
|
||||
|
||||
return response
|
||||
}
|
||||
|
||||
// check channel is one of the following: tx-sms, tx-voice, ali-sms, ali-voice, email, script
|
||||
func checkChannel(channel string) bool {
|
||||
switch channel {
|
||||
case "tx-sms", "tx-voice", "ali-sms", "ali-voice", "email", "script":
|
||||
return true
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
func replaceLastEightChars(s string) string {
|
||||
if len(s) <= 8 {
|
||||
return strings.Repeat("*", len(s))
|
||||
}
|
||||
return s[:len(s)-8] + strings.Repeat("*", 8)
|
||||
}
|
||||
|
||||
func fillUserNames(ctx *ctx.Context, groupIdSet map[int64]struct{}) map[string]map[string]struct{} {
|
||||
userNameByTarget := make(map[string]map[string]struct{})
|
||||
|
||||
gids := make([]int64, 0, len(groupIdSet))
|
||||
for gid := range groupIdSet {
|
||||
gids = append(gids, gid)
|
||||
}
|
||||
|
||||
users, err := models.UsersGetByGroupIds(ctx, gids)
|
||||
if err != nil {
|
||||
logger.Errorf("UsersGetByGroupIds failed, err: %v", err)
|
||||
return userNameByTarget
|
||||
}
|
||||
|
||||
for _, user := range users {
|
||||
logger.Warningf("user: %s", user.Username)
|
||||
for _, ch := range models.DefaultChannels {
|
||||
target, exist := user.ExtractToken(ch)
|
||||
if exist {
|
||||
if _, ok := userNameByTarget[target]; !ok {
|
||||
userNameByTarget[target] = make(map[string]struct{})
|
||||
}
|
||||
userNameByTarget[target][user.Username] = struct{}{}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return userNameByTarget
|
||||
}
|
||||
@@ -182,7 +182,7 @@ func (rt *Router) notifyConfigPut(c *gin.Context) {
|
||||
|
||||
smtp, errSmtp := SmtpValidate(text)
|
||||
ginx.Dangerous(errSmtp)
|
||||
go sender.RestartEmailSender(smtp)
|
||||
go sender.RestartEmailSender(rt.Ctx, smtp)
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Message(nil)
|
||||
|
||||
@@ -196,6 +196,13 @@ func (rt *Router) taskTplDel(c *gin.Context) {
|
||||
return
|
||||
}
|
||||
|
||||
ids, err := models.GetAlertRuleIdsByTaskId(rt.Ctx, tid)
|
||||
ginx.Dangerous(err)
|
||||
if len(ids) > 0 {
|
||||
ginx.NewRender(c).Message("can't del this task tpl, used by alert rule ids(%v) ", ids)
|
||||
return
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Message(tpl.Del(rt.Ctx))
|
||||
}
|
||||
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
package sso
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"log"
|
||||
"time"
|
||||
|
||||
@@ -145,7 +146,7 @@ func Init(center cconf.Center, ctx *ctx.Context, configCache *memsto.ConfigCache
|
||||
}
|
||||
}
|
||||
if configCache == nil {
|
||||
logger.Error("configCache is nil, sso initialization failed")
|
||||
log.Fatalln(fmt.Errorf("configCache is nil, sso initialization failed"))
|
||||
}
|
||||
ssoClient.configCache = configCache
|
||||
userVariableMap := configCache.Get()
|
||||
|
||||
41
cron/clean_notify_record.go
Normal file
41
cron/clean_notify_record.go
Normal file
@@ -0,0 +1,41 @@
|
||||
package cron
|
||||
|
||||
import (
|
||||
"time"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
|
||||
"github.com/robfig/cron/v3"
|
||||
"github.com/toolkits/pkg/logger"
|
||||
)
|
||||
|
||||
func cleanNotifyRecord(ctx *ctx.Context, day int) {
|
||||
lastWeek := time.Now().Unix() - 86400*int64(day)
|
||||
err := models.DB(ctx).Model(&models.NotificaitonRecord{}).Where("created_at < ?", lastWeek).Delete(&models.NotificaitonRecord{}).Error
|
||||
if err != nil {
|
||||
logger.Errorf("Failed to clean notify record: %v", err)
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
// 每天凌晨1点执行清理任务
|
||||
func CleanNotifyRecord(ctx *ctx.Context, day int) {
|
||||
c := cron.New()
|
||||
if day < 1 {
|
||||
day = 7
|
||||
}
|
||||
|
||||
// 使用cron表达式设置每天凌晨1点执行
|
||||
_, err := c.AddFunc("0 1 * * *", func() {
|
||||
cleanNotifyRecord(ctx, day)
|
||||
})
|
||||
|
||||
if err != nil {
|
||||
logger.Errorf("Failed to add clean notify record cron job: %v", err)
|
||||
return
|
||||
}
|
||||
|
||||
// 启动cron任务
|
||||
c.Start()
|
||||
}
|
||||
@@ -366,6 +366,7 @@ CREATE TABLE `target` (
|
||||
`host_ip` varchar(15) default '' COMMENT 'IPv4 string',
|
||||
`agent_version` varchar(255) default '' COMMENT 'agent version',
|
||||
`engine_name` varchar(255) default '' COMMENT 'engine_name',
|
||||
`os` VARCHAR(31) DEFAULT '' COMMENT 'os type',
|
||||
`update_at` bigint not null default 0,
|
||||
PRIMARY KEY (`id`),
|
||||
UNIQUE KEY (`ident`),
|
||||
@@ -546,6 +547,18 @@ CREATE TABLE `builtin_payloads` (
|
||||
KEY `idx_type` (`type`)
|
||||
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
|
||||
|
||||
CREATE TABLE notification_record (
|
||||
`id` BIGINT PRIMARY KEY AUTO_INCREMENT,
|
||||
`event_id` BIGINT NOT NULL,
|
||||
`sub_id` BIGINT NOT NULL,
|
||||
`channel` VARCHAR(255) NOT NULL,
|
||||
`status` TINYINT NOT NULL DEFAULT 0,
|
||||
`target` VARCHAR(1024) NOT NULL,
|
||||
`details` VARCHAR(2048),
|
||||
`created_at` BIGINT NOT NULL,
|
||||
INDEX idx_evt (event_id)
|
||||
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
|
||||
|
||||
CREATE TABLE `task_tpl`
|
||||
(
|
||||
`id` int unsigned NOT NULL AUTO_INCREMENT,
|
||||
|
||||
@@ -83,4 +83,20 @@ ALTER TABLE recording_rule ADD COLUMN cron_pattern VARCHAR(255) DEFAULT '' COMME
|
||||
|
||||
/* v7.0.0-beta.14 */
|
||||
ALTER TABLE alert_cur_event ADD COLUMN original_tags TEXT COMMENT 'labels key=val,,k2=v2';
|
||||
ALTER TABLE alert_his_event ADD COLUMN original_tags TEXT COMMENT 'labels key=val,,k2=v2';
|
||||
ALTER TABLE alert_his_event ADD COLUMN original_tags TEXT COMMENT 'labels key=val,,k2=v2';
|
||||
|
||||
/* v7.1.0 */
|
||||
ALTER TABLE target ADD COLUMN os VARCHAR(31) DEFAULT '' COMMENT 'os type';
|
||||
|
||||
/* v7.2.0 */
|
||||
CREATE TABLE notification_record (
|
||||
`id` BIGINT PRIMARY KEY AUTO_INCREMENT,
|
||||
`event_id` BIGINT NOT NULL,
|
||||
`sub_id` BIGINT NOT NULL,
|
||||
`channel` VARCHAR(255) NOT NULL,
|
||||
`status` TINYINT NOT NULL DEFAULT 0,
|
||||
`target` VARCHAR(1024) NOT NULL,
|
||||
`details` VARCHAR(2048),
|
||||
`created_at` BIGINT NOT NULL,
|
||||
INDEX idx_evt (event_id)
|
||||
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
|
||||
1
go.mod
1
go.mod
@@ -18,6 +18,7 @@ require (
|
||||
github.com/golang/snappy v0.0.4
|
||||
github.com/google/uuid v1.3.0
|
||||
github.com/hashicorp/go-version v1.6.0
|
||||
github.com/jinzhu/copier v0.4.0
|
||||
github.com/json-iterator/go v1.1.12
|
||||
github.com/koding/multiconfig v0.0.0-20171124222453-69c27309b2d7
|
||||
github.com/mailru/easyjson v0.7.7
|
||||
|
||||
2
go.sum
2
go.sum
@@ -160,6 +160,8 @@ github.com/jackc/puddle v0.0.0-20190413234325-e4ced69a3a2b/go.mod h1:m4B5Dj62Y0f
|
||||
github.com/jackc/puddle v0.0.0-20190608224051-11cab39313c9/go.mod h1:m4B5Dj62Y0fbyuIc15OsIqK0+JU8nkqQjsgx7dvjSWk=
|
||||
github.com/jackc/puddle v1.1.3/go.mod h1:m4B5Dj62Y0fbyuIc15OsIqK0+JU8nkqQjsgx7dvjSWk=
|
||||
github.com/jackc/puddle v1.3.0/go.mod h1:m4B5Dj62Y0fbyuIc15OsIqK0+JU8nkqQjsgx7dvjSWk=
|
||||
github.com/jinzhu/copier v0.4.0 h1:w3ciUoD19shMCRargcpm0cm91ytaBhDvuRpz1ODO/U8=
|
||||
github.com/jinzhu/copier v0.4.0/go.mod h1:DfbEm0FYsaqBcKcFuvmOZb218JkPGtvSHsKg8S8hyyg=
|
||||
github.com/jinzhu/inflection v1.0.0 h1:K317FqzuhWc8YvSVlFMCCUb36O/S9MCKRDI7QkRKD/E=
|
||||
github.com/jinzhu/inflection v1.0.0/go.mod h1:h+uFLlag+Qp1Va5pdKtLDYj+kHp5pxUVkryuEj+Srlc=
|
||||
github.com/jinzhu/now v1.1.4/go.mod h1:d3SSVoowX0Lcu0IBviAWJpolVfI5UJVZZ7cO71lE/z8=
|
||||
|
||||
@@ -192,7 +192,7 @@
|
||||
"prom_ql": "",
|
||||
"queries": [
|
||||
{
|
||||
"prom_ql": "elasticsearch_filesystem_data_available_bytes / elasticsearch_filesystem_data_size_bytes * 100 \u003c 10",
|
||||
"prom_ql": "elasticsearch_filesystem_data_available_bytes / elasticsearch_filesystem_data_size_in_bytes * 100 \u003c 10",
|
||||
"severity": 1
|
||||
}
|
||||
],
|
||||
@@ -275,7 +275,7 @@
|
||||
"prom_ql": "",
|
||||
"queries": [
|
||||
{
|
||||
"prom_ql": "elasticsearch_filesystem_data_available_bytes / elasticsearch_filesystem_data_size_bytes * 100 \u003c 20",
|
||||
"prom_ql": "elasticsearch_filesystem_data_available_bytes / elasticsearch_filesystem_data_size_in_bytes * 100 \u003c 20",
|
||||
"severity": 2
|
||||
}
|
||||
],
|
||||
@@ -1078,4 +1078,4 @@
|
||||
"update_by": "",
|
||||
"uuid": 1717556327360313000
|
||||
}
|
||||
]
|
||||
]
|
||||
|
||||
@@ -4,7 +4,6 @@ ElasticSearch 通过 HTTP JSON 的方式暴露了自身的监控指标,通过
|
||||
|
||||
如果是小规模集群,设置 `local=false`,从集群中某一个节点抓取数据,即可拿到整个集群所有节点的监控数据。如果是大规模集群,建议设置 `local=true`,在集群的每个节点上都部署抓取器,抓取本地 elasticsearch 进程的监控数据。
|
||||
|
||||
ElasticSearch 详细的监控讲解,请参考这篇 [文章](https://time.geekbang.org/column/article/628847)。
|
||||
|
||||
## 配置示例
|
||||
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# kafka plugin
|
||||
|
||||
Kafka 的核心指标,其实都是通过 JMX 的方式暴露的,可以参考这篇 [文章](https://time.geekbang.org/column/article/628498)。对于 JMX 暴露的指标,使用 jolokia 或者使用 jmx_exporter 那个 jar 包来采集即可,不需要本插件。
|
||||
Kafka 的核心指标,其实都是通过 JMX 的方式暴露的。对于 JMX 暴露的指标,使用 jolokia 或者使用 jmx_exporter 那个 jar 包来采集即可,不需要本插件。
|
||||
|
||||
本插件主要是采集的消费者延迟数据,这个数据无法通过 Kafka 服务端的 JMX 拿到。
|
||||
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# Kubernetes
|
||||
|
||||
这个插件已经废弃。Kubernetes 监控系列可以参考这个 [文章](https://flashcat.cloud/categories/kubernetes%E7%9B%91%E6%8E%A7%E4%B8%93%E6%A0%8F/)。或者参考 [专栏](https://time.geekbang.org/column/article/630306)。
|
||||
这个插件已经废弃。Kubernetes 监控系列可以参考这个 [文章](https://flashcat.cloud/categories/kubernetes%E7%9B%91%E6%8E%A7%E4%B8%93%E6%A0%8F/)。
|
||||
|
||||
不过 Kubernetes 这个类别下的内置告警规则和内置仪表盘都是可以使用的。
|
||||
|
||||
|
||||
@@ -2051,7 +2051,7 @@
|
||||
"cate": "prometheus",
|
||||
"value": "${prom}"
|
||||
},
|
||||
"definition": "label_values(node_cpu_seconds_total, instance)",
|
||||
"definition": "label_values(node_uname_info, instance)",
|
||||
"name": "node",
|
||||
"selected": "$node",
|
||||
"type": "query"
|
||||
|
||||
@@ -252,7 +252,7 @@
|
||||
"cate": "prometheus",
|
||||
"value": "${prom}"
|
||||
},
|
||||
"definition": "label_values(node_cpu_seconds_total, instance)",
|
||||
"definition": "label_values(node_uname_info, instance)",
|
||||
"name": "node",
|
||||
"selected": "$node",
|
||||
"type": "query"
|
||||
|
||||
@@ -305,8 +305,12 @@ func (e *AlertCurEvent) DB2FE() error {
|
||||
e.CallbacksJSON = strings.Fields(e.Callbacks)
|
||||
e.TagsJSON = strings.Split(e.Tags, ",,")
|
||||
e.OriginalTagsJSON = strings.Split(e.OriginalTags, ",,")
|
||||
json.Unmarshal([]byte(e.Annotations), &e.AnnotationsJSON)
|
||||
json.Unmarshal([]byte(e.RuleConfig), &e.RuleConfigJson)
|
||||
if err := json.Unmarshal([]byte(e.Annotations), &e.AnnotationsJSON); err != nil {
|
||||
return err
|
||||
}
|
||||
if err := json.Unmarshal([]byte(e.RuleConfig), &e.RuleConfigJson); err != nil {
|
||||
return err
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
@@ -346,6 +350,34 @@ func (e *AlertCurEvent) DB2Mem() {
|
||||
}
|
||||
}
|
||||
|
||||
func FillRuleConfigTplName(ctx *ctx.Context, ruleConfig string) (interface{}, bool) {
|
||||
var config RuleConfig
|
||||
err := json.Unmarshal([]byte(ruleConfig), &config)
|
||||
if err != nil {
|
||||
logger.Warningf("failed to unmarshal rule config: %v", err)
|
||||
return nil, false
|
||||
}
|
||||
|
||||
if len(config.TaskTpls) == 0 {
|
||||
return nil, false
|
||||
}
|
||||
|
||||
for i := 0; i < len(config.TaskTpls); i++ {
|
||||
tpl, err := TaskTplGetById(ctx, config.TaskTpls[i].TplId)
|
||||
if err != nil {
|
||||
logger.Warningf("failed to get task tpl by id:%d, %v", config.TaskTpls[i].TplId, err)
|
||||
return nil, false
|
||||
}
|
||||
|
||||
if tpl == nil {
|
||||
logger.Warningf("task tpl not found by id:%d", config.TaskTpls[i].TplId)
|
||||
return nil, false
|
||||
}
|
||||
config.TaskTpls[i].TplName = tpl.Title
|
||||
}
|
||||
return config, true
|
||||
}
|
||||
|
||||
// for webui
|
||||
func (e *AlertCurEvent) FillNotifyGroups(ctx *ctx.Context, cache map[int64]*UserGroup) error {
|
||||
// some user-group already deleted ?
|
||||
@@ -471,6 +503,11 @@ func AlertCurEventDel(ctx *ctx.Context, ids []int64) error {
|
||||
}
|
||||
|
||||
func AlertCurEventDelByHash(ctx *ctx.Context, hash string) error {
|
||||
if !ctx.IsCenter {
|
||||
_, err := poster.GetByUrls[string](ctx, "/v1/n9e/alert-cur-events-del-by-hash?hash="+hash)
|
||||
return err
|
||||
}
|
||||
|
||||
return DB(ctx).Where("hash = ?", hash).Delete(&AlertCurEvent{}).Error
|
||||
}
|
||||
|
||||
@@ -589,8 +626,8 @@ func AlertCurEventGetMap(ctx *ctx.Context, cluster string) (map[int64]map[string
|
||||
return ret, nil
|
||||
}
|
||||
|
||||
func (m *AlertCurEvent) UpdateFieldsMap(ctx *ctx.Context, fields map[string]interface{}) error {
|
||||
return DB(ctx).Model(m).Updates(fields).Error
|
||||
func (e *AlertCurEvent) UpdateFieldsMap(ctx *ctx.Context, fields map[string]interface{}) error {
|
||||
return DB(ctx).Model(e).Updates(fields).Error
|
||||
}
|
||||
|
||||
func AlertCurEventUpgradeToV6(ctx *ctx.Context, dsm map[string]Datasource) error {
|
||||
|
||||
@@ -117,6 +117,10 @@ func (e *AlertHisEvent) FillNotifyGroups(ctx *ctx.Context, cache map[int64]*User
|
||||
return nil
|
||||
}
|
||||
|
||||
// func (e *AlertHisEvent) FillTaskTplName(ctx *ctx.Context, cache map[int64]*UserGroup) error {
|
||||
|
||||
// }
|
||||
|
||||
func AlertHisEventTotal(ctx *ctx.Context, prods []string, bgids []int64, stime, etime int64, severity int, recovered int, dsIds []int64, cates []string, query string) (int64, error) {
|
||||
session := DB(ctx).Model(&AlertHisEvent{}).Where("last_eval_time between ? and ?", stime, etime)
|
||||
|
||||
|
||||
@@ -11,6 +11,7 @@ import (
|
||||
"github.com/ccfos/nightingale/v6/pkg/poster"
|
||||
"github.com/ccfos/nightingale/v6/pushgw/pconf"
|
||||
|
||||
"github.com/jinzhu/copier"
|
||||
"github.com/pkg/errors"
|
||||
"github.com/toolkits/pkg/logger"
|
||||
"github.com/toolkits/pkg/str"
|
||||
@@ -26,6 +27,21 @@ const (
|
||||
TDENGINE = "tdengine"
|
||||
)
|
||||
|
||||
const (
|
||||
AlertRuleEnabled = 0
|
||||
AlertRuleDisabled = 1
|
||||
|
||||
AlertRuleEnableInGlobalBG = 0
|
||||
AlertRuleEnableInOneBG = 1
|
||||
|
||||
AlertRuleNotNotifyRecovered = 0
|
||||
AlertRuleNotifyRecovered = 1
|
||||
|
||||
AlertRuleNotifyRepeatStep60Min = 60
|
||||
|
||||
AlertRuleRecoverDuration0Sec = 0
|
||||
)
|
||||
|
||||
type AlertRule struct {
|
||||
Id int64 `json:"id" gorm:"primaryKey"`
|
||||
GroupId int64 `json:"group_id"` // busi group id
|
||||
@@ -84,6 +100,24 @@ type AlertRule struct {
|
||||
UUID int64 `json:"uuid" gorm:"-"` // tpl identifier
|
||||
}
|
||||
|
||||
type Tpl struct {
|
||||
TplId int64 `json:"tpl_id"`
|
||||
TplName string `json:"tpl_name"`
|
||||
Host []string `json:"host"`
|
||||
}
|
||||
|
||||
type RuleConfig struct {
|
||||
Version string `json:"version,omitempty"`
|
||||
EventRelabelConfig []*pconf.RelabelConfig `json:"event_relabel_config,omitempty"`
|
||||
TaskTpls []*Tpl `json:"task_tpls,omitempty"`
|
||||
Queries interface{} `json:"queries,omitempty"`
|
||||
Triggers []Trigger `json:"triggers,omitempty"`
|
||||
Inhibit bool `json:"inhibit,omitempty"`
|
||||
PromQl string `json:"prom_ql,omitempty"`
|
||||
Severity int `json:"severity,omitempty"`
|
||||
AlgoParams interface{} `json:"algo_params,omitempty"`
|
||||
}
|
||||
|
||||
type PromRuleConfig struct {
|
||||
Queries []PromQuery `json:"queries"`
|
||||
Inhibit bool `json:"inhibit"`
|
||||
@@ -122,6 +156,10 @@ type Trigger struct {
|
||||
Mode int `json:"mode"`
|
||||
Exp string `json:"exp"`
|
||||
Severity int `json:"severity"`
|
||||
|
||||
Type string `json:"type,omitempty"`
|
||||
Duration int `json:"duration,omitempty"`
|
||||
Percent int `json:"percent,omitempty"`
|
||||
}
|
||||
|
||||
func GetHostsQuery(queries []HostQuery) []map[string]interface{} {
|
||||
@@ -422,6 +460,19 @@ func (ar *AlertRule) UpdateColumn(ctx *ctx.Context, column string, value interfa
|
||||
return DB(ctx).Model(ar).UpdateColumn("annotations", string(b)).Error
|
||||
}
|
||||
|
||||
if column == "annotations" {
|
||||
newAnnotations := value.(map[string]interface{})
|
||||
ar.AnnotationsJSON = make(map[string]string)
|
||||
for k, v := range newAnnotations {
|
||||
ar.AnnotationsJSON[k] = v.(string)
|
||||
}
|
||||
b, err := json.Marshal(ar.AnnotationsJSON)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
return DB(ctx).Model(ar).UpdateColumn("annotations", string(b)).Error
|
||||
}
|
||||
|
||||
return DB(ctx).Model(ar).UpdateColumn(column, value).Error
|
||||
}
|
||||
|
||||
@@ -679,6 +730,25 @@ func AlertRuleExists(ctx *ctx.Context, id, groupId int64, datasourceIds []int64,
|
||||
return false, nil
|
||||
}
|
||||
|
||||
func GetAlertRuleIdsByTaskId(ctx *ctx.Context, taskId int64) ([]int64, error) {
|
||||
tpl := "%\"tpl_id\":" + fmt.Sprint(taskId) + "}%"
|
||||
cb := "{ibex}/" + fmt.Sprint(taskId) + "%"
|
||||
session := DB(ctx).Where("rule_config like ? or callbacks like ?", tpl, cb)
|
||||
|
||||
var lst []AlertRule
|
||||
var ids []int64
|
||||
err := session.Find(&lst).Error
|
||||
if err != nil || len(lst) == 0 {
|
||||
return ids, err
|
||||
}
|
||||
|
||||
for i := 0; i < len(lst); i++ {
|
||||
ids = append(ids, lst[i].Id)
|
||||
}
|
||||
|
||||
return ids, nil
|
||||
}
|
||||
|
||||
func AlertRuleGets(ctx *ctx.Context, groupId int64) ([]AlertRule, error) {
|
||||
session := DB(ctx).Where("group_id=?", groupId).Order("name")
|
||||
|
||||
@@ -1011,3 +1081,19 @@ func GetTargetsOfHostAlertRule(ctx *ctx.Context, engineName string) (map[string]
|
||||
|
||||
return m, nil
|
||||
}
|
||||
|
||||
func (ar *AlertRule) Copy(ctx *ctx.Context) (*AlertRule, error) {
|
||||
newAr := &AlertRule{}
|
||||
err := copier.Copy(newAr, ar)
|
||||
if err != nil {
|
||||
logger.Errorf("copy alert rule failed, %v", err)
|
||||
}
|
||||
return newAr, err
|
||||
}
|
||||
|
||||
func InsertAlertRule(ctx *ctx.Context, ars []*AlertRule) error {
|
||||
if len(ars) == 0 {
|
||||
return nil
|
||||
}
|
||||
return DB(ctx).Create(ars).Error
|
||||
}
|
||||
|
||||
@@ -58,7 +58,7 @@ func MigrateTables(db *gorm.DB) error {
|
||||
dts := []interface{}{&RecordingRule{}, &AlertRule{}, &AlertSubscribe{}, &AlertMute{},
|
||||
&TaskRecord{}, &ChartShare{}, &Target{}, &Configs{}, &Datasource{}, &NotifyTpl{},
|
||||
&Board{}, &BoardBusigroup{}, &Users{}, &SsoConfig{}, &models.BuiltinMetric{},
|
||||
&models.MetricFilter{}, &models.BuiltinComponent{}}
|
||||
&models.MetricFilter{}, &models.BuiltinComponent{}, &models.NotificaitonRecord{}}
|
||||
|
||||
if !columnHasIndex(db, &AlertHisEvent{}, "original_tags") ||
|
||||
!columnHasIndex(db, &AlertCurEvent{}, "original_tags") {
|
||||
@@ -230,10 +230,11 @@ type Target struct {
|
||||
HostIp string `gorm:"column:host_ip;varchar(15);default:'';comment:IPv4 string;index:idx_host_ip"`
|
||||
AgentVersion string `gorm:"column:agent_version;varchar(255);default:'';comment:agent version;index:idx_agent_version"`
|
||||
EngineName string `gorm:"column:engine_name;varchar(255);default:'';comment:engine name;index:idx_engine_name"`
|
||||
OS string `gorm:"column:os;varchar(31);default:'';comment:os type;index:idx_os"`
|
||||
}
|
||||
|
||||
type Datasource struct {
|
||||
IsDefault bool `gorm:"column:is_default;type:boolean;not null;comment:is default datasource"`
|
||||
IsDefault bool `gorm:"column:is_default;type:boolean;comment:is default datasource"`
|
||||
}
|
||||
|
||||
type Configs struct {
|
||||
|
||||
91
models/notification_record.go
Normal file
91
models/notification_record.go
Normal file
@@ -0,0 +1,91 @@
|
||||
package models
|
||||
|
||||
import (
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
"github.com/toolkits/pkg/logger"
|
||||
"github.com/toolkits/pkg/str"
|
||||
)
|
||||
|
||||
const (
|
||||
NotiStatusSuccess = iota + 1
|
||||
NotiStatusFailure
|
||||
)
|
||||
|
||||
type NotificaitonRecord struct {
|
||||
Id int64 `json:"id" gorm:"primaryKey;type:bigint;autoIncrement"`
|
||||
EventId int64 `json:"event_id" gorm:"type:bigint;not null;index:idx_evt,priority:1;comment:event history id"`
|
||||
SubId int64 `json:"sub_id" gorm:"type:bigint;comment:subscribed rule id"`
|
||||
Channel string `json:"channel" gorm:"type:varchar(255);not null;comment:notification channel name"`
|
||||
Status int `json:"status" gorm:"type:int;comment:notification status"` // 1-成功,2-失败
|
||||
Target string `json:"target" gorm:"type:varchar(1024);not null;comment:notification target"`
|
||||
Details string `json:"details" gorm:"type:varchar(2048);default:'';comment:notification other info"`
|
||||
CreatedAt int64 `json:"created_at" gorm:"type:bigint;not null;comment:create time"`
|
||||
}
|
||||
|
||||
func NewNotificationRecord(event *AlertCurEvent, channel, target string) *NotificaitonRecord {
|
||||
return &NotificaitonRecord{
|
||||
EventId: event.Id,
|
||||
SubId: event.SubRuleId,
|
||||
Channel: channel,
|
||||
Status: NotiStatusSuccess,
|
||||
Target: target,
|
||||
}
|
||||
}
|
||||
|
||||
func (n *NotificaitonRecord) SetStatus(status int) {
|
||||
if n == nil {
|
||||
return
|
||||
}
|
||||
n.Status = status
|
||||
}
|
||||
|
||||
func (n *NotificaitonRecord) SetDetails(details string) {
|
||||
if n == nil {
|
||||
return
|
||||
}
|
||||
n.Details = details
|
||||
}
|
||||
|
||||
func (n *NotificaitonRecord) TableName() string {
|
||||
return "notification_record"
|
||||
}
|
||||
|
||||
func (n *NotificaitonRecord) Add(ctx *ctx.Context) error {
|
||||
return Insert(ctx, n)
|
||||
}
|
||||
|
||||
func (n *NotificaitonRecord) GetGroupIds(ctx *ctx.Context) (groupIds []int64) {
|
||||
if n == nil {
|
||||
return
|
||||
}
|
||||
|
||||
if n.SubId > 0 {
|
||||
if sub, err := AlertSubscribeGet(ctx, "id=?", n.SubId); err != nil {
|
||||
logger.Errorf("AlertSubscribeGet failed, err: %v", err)
|
||||
} else {
|
||||
groupIds = str.IdsInt64(sub.UserGroupIds, " ")
|
||||
}
|
||||
return
|
||||
}
|
||||
|
||||
if event, err := AlertHisEventGetById(ctx, n.EventId); err != nil {
|
||||
logger.Errorf("AlertHisEventGetById failed, err: %v", err)
|
||||
} else {
|
||||
groupIds = str.IdsInt64(event.NotifyGroups, " ")
|
||||
}
|
||||
return
|
||||
}
|
||||
|
||||
func NotificaitonRecordsGetByEventId(ctx *ctx.Context, eid int64) ([]*NotificaitonRecord, error) {
|
||||
return NotificaitonRecordsGet(ctx, "event_id=?", eid)
|
||||
}
|
||||
|
||||
func NotificaitonRecordsGet(ctx *ctx.Context, where string, args ...interface{}) ([]*NotificaitonRecord, error) {
|
||||
var lst []*NotificaitonRecord
|
||||
err := DB(ctx).Where(where, args...).Find(&lst).Error
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
return lst, nil
|
||||
}
|
||||
@@ -7,7 +7,11 @@ const NOTIFYCONTACT = "notify_contact"
|
||||
const SMTP = "smtp_config"
|
||||
const IBEX = "ibex_server"
|
||||
|
||||
var GlobalCallback = 0
|
||||
var RuleCallback = 1
|
||||
|
||||
type Webhook struct {
|
||||
Type int `json:"type"`
|
||||
Enable bool `json:"enable"`
|
||||
Url string `json:"url"`
|
||||
BasicAuthUser string `json:"basic_auth_user"`
|
||||
@@ -17,6 +21,9 @@ type Webhook struct {
|
||||
Headers []string `json:"headers_str"`
|
||||
SkipVerify bool `json:"skip_verify"`
|
||||
Note string `json:"note"`
|
||||
RetryCount int `json:"retry_count"`
|
||||
RetryInterval int `json:"retry_interval"`
|
||||
Batch int `json:"batch"`
|
||||
}
|
||||
|
||||
type NotifyScript struct {
|
||||
|
||||
93
models/prom_alert_rule.go
Normal file
93
models/prom_alert_rule.go
Normal file
@@ -0,0 +1,93 @@
|
||||
package models
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"time"
|
||||
|
||||
"github.com/toolkits/pkg/logger"
|
||||
)
|
||||
|
||||
type PromRule struct {
|
||||
Alert string `yaml:"alert,omitempty" json:"alert,omitempty"` // 报警规则的名称
|
||||
Record string `yaml:"record,omitempty" json:"record,omitempty"` // 记录规则的名称
|
||||
Expr string `yaml:"expr,omitempty" json:"expr,omitempty"` // PromQL 表达式
|
||||
For string `yaml:"for,omitempty" json:"for,omitempty"` // 告警的等待时间
|
||||
Annotations map[string]string `yaml:"annotations,omitempty" json:"annotations,omitempty"` // 规则的注释信息
|
||||
Labels map[string]string `yaml:"labels,omitempty" json:"labels,omitempty"` // 规则的标签信息
|
||||
}
|
||||
|
||||
type PromRuleGroup struct {
|
||||
Name string `yaml:"name"`
|
||||
Rules []PromRule `yaml:"rules"`
|
||||
Interval string `yaml:"interval,omitempty"`
|
||||
}
|
||||
|
||||
func convertInterval(interval string) int {
|
||||
duration, err := time.ParseDuration(interval)
|
||||
if err != nil {
|
||||
logger.Errorf("Error parsing interval `%s`,err: %v", interval, err)
|
||||
return 0
|
||||
}
|
||||
return int(duration.Seconds())
|
||||
}
|
||||
|
||||
func ConvertAlert(rule PromRule, interval string) AlertRule {
|
||||
annotations := rule.Annotations
|
||||
appendTags := []string{}
|
||||
severity := 2
|
||||
|
||||
if len(rule.Labels) > 0 {
|
||||
for k, v := range rule.Labels {
|
||||
if k != "severity" {
|
||||
appendTags = append(appendTags, fmt.Sprintf("%s=%s", k, v))
|
||||
} else {
|
||||
switch v {
|
||||
case "critical":
|
||||
severity = 1
|
||||
case "warning":
|
||||
severity = 2
|
||||
case "info":
|
||||
severity = 3
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return AlertRule{
|
||||
Name: rule.Alert,
|
||||
Severity: severity,
|
||||
Disabled: AlertRuleEnabled,
|
||||
PromForDuration: convertInterval(rule.For),
|
||||
PromQl: rule.Expr,
|
||||
PromEvalInterval: convertInterval(interval),
|
||||
EnableStimeJSON: "00:00",
|
||||
EnableEtimeJSON: "23:59",
|
||||
EnableDaysOfWeekJSON: []string{
|
||||
"1", "2", "3", "4", "5", "6", "0",
|
||||
},
|
||||
EnableInBG: AlertRuleEnableInGlobalBG,
|
||||
NotifyRecovered: AlertRuleNotifyRecovered,
|
||||
NotifyRepeatStep: AlertRuleNotifyRepeatStep60Min,
|
||||
RecoverDuration: AlertRuleRecoverDuration0Sec,
|
||||
AnnotationsJSON: annotations,
|
||||
AppendTagsJSON: appendTags,
|
||||
}
|
||||
}
|
||||
|
||||
func DealPromGroup(promRule []PromRuleGroup) []AlertRule {
|
||||
var alertRules []AlertRule
|
||||
|
||||
for _, group := range promRule {
|
||||
interval := group.Interval
|
||||
if interval == "" {
|
||||
interval = "15s"
|
||||
}
|
||||
for _, rule := range group.Rules {
|
||||
if rule.Alert != "" {
|
||||
alertRules = append(alertRules, ConvertAlert(rule, interval))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return alertRules
|
||||
}
|
||||
84
models/prom_alert_rule_test.go
Normal file
84
models/prom_alert_rule_test.go
Normal file
@@ -0,0 +1,84 @@
|
||||
package models_test
|
||||
|
||||
import (
|
||||
"testing"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"gopkg.in/yaml.v2"
|
||||
)
|
||||
|
||||
func TestConvertAlert(t *testing.T) {
|
||||
jobMissing := []models.PromRule{}
|
||||
err := yaml.Unmarshal([]byte(` - alert: PrometheusJobMissing
|
||||
expr: absent(up{job="prometheus"})
|
||||
for: 1m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: Prometheus job missing (instance {{ $labels.instance }})
|
||||
description: "A Prometheus job has disappeared\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"`), &jobMissing)
|
||||
if err != nil {
|
||||
t.Errorf("Failed to Unmarshal, err: %s", err)
|
||||
}
|
||||
t.Logf("jobMissing: %+v", jobMissing[0])
|
||||
convJobMissing := models.ConvertAlert(jobMissing[0], "30s")
|
||||
if convJobMissing.PromEvalInterval != 30 {
|
||||
t.Errorf("PromEvalInterval is expected to be 30, but got %d",
|
||||
convJobMissing.PromEvalInterval)
|
||||
}
|
||||
if convJobMissing.PromForDuration != 60 {
|
||||
t.Errorf("PromForDuration is expected to be 60, but got %d",
|
||||
convJobMissing.PromForDuration)
|
||||
}
|
||||
if convJobMissing.Severity != 2 {
|
||||
t.Errorf("Severity is expected to be 2, but got %d", convJobMissing.Severity)
|
||||
}
|
||||
|
||||
ruleEvaluationSlow := []models.PromRule{}
|
||||
yaml.Unmarshal([]byte(` - alert: PrometheusRuleEvaluationSlow
|
||||
expr: prometheus_rule_group_last_duration_seconds > prometheus_rule_group_interval_seconds
|
||||
for: 180s
|
||||
labels:
|
||||
severity: info
|
||||
annotations:
|
||||
summary: Prometheus rule evaluation slow (instance {{ $labels.instance }})
|
||||
description: "Prometheus rule evaluation took more time than the scheduled interval. It indicates a slower storage backend access or too complex query.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
|
||||
`), &ruleEvaluationSlow)
|
||||
t.Logf("ruleEvaluationSlow: %+v", ruleEvaluationSlow[0])
|
||||
convRuleEvaluationSlow := models.ConvertAlert(ruleEvaluationSlow[0], "1m")
|
||||
if convRuleEvaluationSlow.PromEvalInterval != 60 {
|
||||
t.Errorf("PromEvalInterval is expected to be 60, but got %d",
|
||||
convJobMissing.PromEvalInterval)
|
||||
}
|
||||
if convRuleEvaluationSlow.PromForDuration != 180 {
|
||||
t.Errorf("PromForDuration is expected to be 180, but got %d",
|
||||
convJobMissing.PromForDuration)
|
||||
}
|
||||
if convRuleEvaluationSlow.Severity != 3 {
|
||||
t.Errorf("Severity is expected to be 3, but got %d", convJobMissing.Severity)
|
||||
}
|
||||
|
||||
targetMissing := []models.PromRule{}
|
||||
yaml.Unmarshal([]byte(` - alert: PrometheusTargetMissing
|
||||
expr: up == 0
|
||||
for: 1.5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: Prometheus target missing (instance {{ $labels.instance }})
|
||||
description: "A Prometheus target has disappeared. An exporter might be crashed.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
|
||||
`), &targetMissing)
|
||||
t.Logf("targetMissing: %+v", targetMissing[0])
|
||||
convTargetMissing := models.ConvertAlert(targetMissing[0], "1h")
|
||||
if convTargetMissing.PromEvalInterval != 3600 {
|
||||
t.Errorf("PromEvalInterval is expected to be 3600, but got %d",
|
||||
convTargetMissing.PromEvalInterval)
|
||||
}
|
||||
if convTargetMissing.PromForDuration != 90 {
|
||||
t.Errorf("PromForDuration is expected to be 90, but got %d",
|
||||
convTargetMissing.PromForDuration)
|
||||
}
|
||||
if convTargetMissing.Severity != 1 {
|
||||
t.Errorf("Severity is expected to be 1, but got %d", convTargetMissing.Severity)
|
||||
}
|
||||
}
|
||||
@@ -26,6 +26,7 @@ type Target struct {
|
||||
HostIp string `json:"host_ip"` //ipv4,do not needs range select
|
||||
AgentVersion string `json:"agent_version"`
|
||||
EngineName string `json:"engine_name"`
|
||||
OS string `json:"os" gorm:"column:os"`
|
||||
|
||||
UnixTime int64 `json:"unixtime" gorm:"-"`
|
||||
Offset int64 `json:"offset" gorm:"-"`
|
||||
@@ -33,7 +34,6 @@ type Target struct {
|
||||
MemUtil float64 `json:"mem_util" gorm:"-"`
|
||||
CpuNum int `json:"cpu_num" gorm:"-"`
|
||||
CpuUtil float64 `json:"cpu_util" gorm:"-"`
|
||||
OS string `json:"os" gorm:"-"`
|
||||
Arch string `json:"arch" gorm:"-"`
|
||||
RemoteAddr string `json:"remote_addr" gorm:"-"`
|
||||
}
|
||||
@@ -63,6 +63,17 @@ func (t *Target) FillGroup(ctx *ctx.Context, cache map[int64]*BusiGroup) error {
|
||||
return nil
|
||||
}
|
||||
|
||||
func (t *Target) AfterFind(tx *gorm.DB) (err error) {
|
||||
delta := time.Now().Unix() - t.UpdateAt
|
||||
if delta < 60 {
|
||||
t.TargetUp = 2
|
||||
} else if delta < 180 {
|
||||
t.TargetUp = 1
|
||||
}
|
||||
t.FillTagsMap()
|
||||
return
|
||||
}
|
||||
|
||||
func TargetStatistics(ctx *ctx.Context) (*Statistics, error) {
|
||||
if !ctx.IsCenter {
|
||||
s, err := poster.GetByUrls[*Statistics](ctx, "/v1/n9e/statistic?name=target")
|
||||
@@ -111,7 +122,8 @@ func BuildTargetWhereWithQuery(query string) BuildTargetWhereOption {
|
||||
arr := strings.Fields(query)
|
||||
for i := 0; i < len(arr); i++ {
|
||||
q := "%" + arr[i] + "%"
|
||||
session = session.Where("ident like ? or note like ? or tags like ?", q, q, q)
|
||||
session = session.Where("ident like ? or note like ? or tags like ? "+
|
||||
"or os like ?", q, q, q, q)
|
||||
}
|
||||
}
|
||||
return session
|
||||
@@ -386,28 +398,16 @@ func (t *Target) DelTags(ctx *ctx.Context, tags []string) error {
|
||||
|
||||
func (t *Target) FillTagsMap() {
|
||||
t.TagsJSON = strings.Fields(t.Tags)
|
||||
t.TagsMap = make(map[string]string)
|
||||
m := make(map[string]string)
|
||||
for _, item := range t.TagsJSON {
|
||||
arr := strings.Split(item, "=")
|
||||
if len(arr) != 2 {
|
||||
continue
|
||||
}
|
||||
m[arr[0]] = arr[1]
|
||||
}
|
||||
|
||||
t.TagsMap = m
|
||||
t.TagsMap = t.GetTagsMap()
|
||||
}
|
||||
|
||||
func (t *Target) GetTagsMap() map[string]string {
|
||||
tagsJSON := strings.Fields(t.Tags)
|
||||
m := make(map[string]string)
|
||||
for _, item := range tagsJSON {
|
||||
arr := strings.Split(item, "=")
|
||||
if len(arr) != 2 {
|
||||
continue
|
||||
if arr := strings.Split(item, "="); len(arr) == 2 {
|
||||
m[arr[0]] = arr[1]
|
||||
}
|
||||
m[arr[0]] = arr[1]
|
||||
}
|
||||
return m
|
||||
}
|
||||
@@ -418,7 +418,6 @@ func (t *Target) FillMeta(meta *HostMeta) {
|
||||
t.CpuNum = meta.CpuNum
|
||||
t.UnixTime = meta.UnixTime
|
||||
t.Offset = meta.Offset
|
||||
t.OS = meta.OS
|
||||
t.Arch = meta.Arch
|
||||
t.RemoteAddr = meta.RemoteAddr
|
||||
}
|
||||
|
||||
@@ -304,6 +304,22 @@ func UserGetById(ctx *ctx.Context, id int64) (*User, error) {
|
||||
return UserGet(ctx, "id=?", id)
|
||||
}
|
||||
|
||||
func UsersGetByGroupIds(ctx *ctx.Context, groupIds []int64) ([]User, error) {
|
||||
if len(groupIds) == 0 {
|
||||
return nil, nil
|
||||
}
|
||||
|
||||
userIds, err := GroupsMemberIds(ctx, groupIds)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
users, err := UserGetsByIds(ctx, userIds)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return users, nil
|
||||
}
|
||||
|
||||
func InitRoot(ctx *ctx.Context) {
|
||||
user, err := UserGetByUsername(ctx, "root")
|
||||
if err != nil {
|
||||
|
||||
@@ -38,6 +38,12 @@ func MemberIds(ctx *ctx.Context, groupId int64) ([]int64, error) {
|
||||
return ids, err
|
||||
}
|
||||
|
||||
func GroupsMemberIds(ctx *ctx.Context, groupIds []int64) ([]int64, error) {
|
||||
var ids []int64
|
||||
err := DB(ctx).Model(&UserGroupMember{}).Where("group_id in ?", groupIds).Pluck("user_id", &ids).Error
|
||||
return ids, err
|
||||
}
|
||||
|
||||
func UserGroupMemberCount(ctx *ctx.Context, where string, args ...interface{}) (int64, error) {
|
||||
return Count(DB(ctx).Model(&UserGroupMember{}).Where(where, args...))
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user