Compare commits

..

1 Commits

Author SHA1 Message Date
Ulric Qin
0003b33e99 code refactor notify plugin 2022-07-22 17:56:25 +08:00
139 changed files with 3216 additions and 12183 deletions

View File

@@ -22,20 +22,20 @@
## Highlighted Features
- **开箱即用**
- 支持 Docker、Helm Chart、云服务等多种部署方式,集数据采集、监控告警、可视化为一体,内置多种监控仪表盘、快捷视图、告警规则模板,导入即可快速使用,**大幅降低云原生监控系统的建设成本、学习成本、使用成本**
- **专业告警**
- 可视化的告警配置和管理,支持丰富的告警规则,提供屏蔽规则、订阅规则的配置能力,支持告警多种送达渠道,支持告警自愈、告警事件管理等
- **云原生**
- 以交钥匙的方式快速构建企业级的云原生监控体系,支持 [**Categraf**](https://github.com/flashcatcloud/categraf)、Telegraf、Grafana-agent 等多种采集器,支持 Prometheus、VictoriaMetrics、M3DB、ElasticSearch 等多种数据库,兼容支持导入 Grafana 仪表盘,**与云原生生态无缝集成**
- **高性能,高可用**
- 得益于夜莺的多数据源管理引擎,和夜莺引擎侧优秀的架构设计,借助于高性能时序库,可以满足数亿时间线的采集、存储、告警分析场景,节省大量成本;
- 夜莺监控组件均可水平扩展,无单点,已在上千家企业部署落地,经受了严苛的生产实践检验。众多互联网头部公司,夜莺集群机器达百台,处理数亿级时间线,重度使用夜莺监控;
- **灵活扩展,中心化管理**
- 夜莺监控,可部署在 1 核 1G 的云主机,可在上百台机器集群化部署,可运行在 K8s 中;也可将时序库、告警引擎等组件下沉到各机房、各 Region兼顾边缘部署和中心化统一管理**解决数据割裂,缺乏统一视图的难题**
- 支持 Docker、Helm Chart 等多种部署方式,内置多种监控盘、快捷视图、告警规则模板,导入即可快速使用,活跃、专业的社区用户也在持续迭代和沉淀更多的最佳实践于产品中
- **兼容并包**
- 支持 [Categraf](https://github.com/flashcatcloud/categraf)、Telegraf、Grafana-agent 等多种采集器,支持 Prometheus、VictoriaMetrics、M3DB 等各种时序数据库,支持对接 Grafana与云原生生态无缝集成
- 集数据采集、可视化、监控告警、数据分析于一体,与云原生生态紧密集成,提供开箱即用的企业级监控分析和告警能力;
- **开放社区**
- 托管于[中国计算机学会开源发展委员会](https://www.ccf.org.cn/kyfzwyh/),有[**快猫星云**](https://flashcat.cloud)和众多公司的持续投入,和数千名社区用户的积极参与,以及夜莺监控项目清晰明确的定位,都保证了夜莺开源社区健康、长久的发展。活跃、专业的社区用户也在持续迭代和沉淀更多的最佳实践于产品中
- 托管于[中国计算机学会开源发展委员会](https://www.ccf.org.cn/kyfzwyh/),有[快猫星云](https://flashcat.cloud)的持续投入,和数千名社区用户的积极参与,以及夜莺监控项目清晰明确的定位,都保证了夜莺开源社区健康、长久的发展;
- **高性能**
- 得益于夜莺的多数据源管理引擎,和夜莺引擎侧优秀的架构设计,借助于高性能时序库,可以满足数亿时间线的采集、存储、告警分析场景,节省大量成本;
- **高可用**
- 夜莺监控组件均可水平扩展,无单点,已在上千家企业部署落地,经受了严苛的生产实践检验。众多互联网头部公司,夜莺集群机器达百台,处理十亿级时间线,重度使用夜莺监控;
- **灵活扩展**
- 夜莺监控可部署在1核1G的云主机可在上百台机器部署集群可运行在K8s中也可将时序库、告警引擎等组件下沉到各机房、各region兼顾边缘部署和中心化管理
> 如果您在使用 Prometheus 过程中,有以下的一个或者多个需求场景,推荐您无缝升级到夜莺:
> 如果您在使用 Prometheus 过程中,有以下的一个或者多个需求场景,推荐您升级到夜莺:
- Prometheus、Alertmanager、Grafana 等多个系统较为割裂,缺乏统一视图,无法开箱即用;
- 通过修改配置文件来管理 Prometheus、Alertmanager 的方式,学习曲线大,协同有难度;
@@ -50,7 +50,7 @@
> 如果您在使用 [Open-Falcon](https://github.com/open-falcon/falcon-plus),我们更推荐您升级到夜莺:
- 关于 Open-Falcon 和夜莺的详细介绍,请参考阅读[云原生监控的十个特点和趋势](https://mp.weixin.qq.com/s?__biz=MzkzNjI5OTM5Nw==&mid=2247483738&idx=1&sn=e8bdbb974a2cd003c1abcc2b5405dd18&chksm=c2a19fb0f5d616a63185cd79277a79a6b80118ef2185890d0683d2bb20451bd9303c78d083c5#rd)。
- 关于 Open-Falcon 和夜莺的详细介绍,请参考阅读[云原生监控的十个特点和趋势](https://mp.weixin.qq.com/s?__biz=MzkzNjI5OTM5Nw==&mid=2247483738&idx=1&sn=e8bdbb974a2cd003c1abcc2b5405dd18&chksm=c2a19fb0f5d616a63185cd79277a79a6b80118ef2185890d0683d2bb20451bd9303c78d083c5#rd)。
> 我们推荐您使用 [Categraf](https://github.com/flashcatcloud/categraf) 作为首选的监控数据采集器:
@@ -59,33 +59,34 @@
## Getting Started
- [国外文档](https://n9e.github.io/)
- [国内文档](http://n9e.flashcat.cloud/)
- [快速安装](https://mp.weixin.qq.com/s/iEC4pfL1TgjMDOWYh8H-FA)
- [详细文档](https://n9e.github.io/)
- [社区分享](https://n9e.github.io/docs/prologue/share/)
## Screenshots
<img src="doc/img/intro.gif" width="480">
<img src="doc/img/intro.gif" width="680">
## Architecture
<img src="doc/img/arch-product.png" width="480">
<img src="doc/img/arch-product.png" width="680">
夜莺监控可以接收各种采集器上报的监控数据(比如 [Categraf](https://github.com/flashcatcloud/categraf)、telegraf、grafana-agent、Prometheus并写入多种流行的时序数据库中可以支持Prometheus、M3DB、VictoriaMetrics、Thanos、TDEngine等提供告警规则、屏蔽规则、订阅规则的配置能力提供监控数据的查看能力提供告警自愈机制告警触发之后自动回调某个webhook地址或者执行某个脚本提供历史告警事件的存储管理、分组查看的能力。
<img src="doc/img/arch-system.png" width="480">
<img src="doc/img/arch-system.png" width="680">
夜莺 v5 版本的设计非常简单,核心是 server 和 webapi 两个模块webapi 无状态放到中心端承接前端请求将用户配置写入数据库server 是告警引擎和数据转发模块,一般随着时序库走,一个时序库就对应一套 server每套 server 可以只用一个实例也可以多个实例组成集群server 可以接收 Categraf、Telegraf、Grafana-Agent、Datadog-Agent、Falcon-Plugins 上报的数据,写入后端时序库,周期性从数据库同步告警规则,然后查询时序库做告警判断。每套 server 依赖一个 redis。
<img src="doc/img/install-vm.png" width="480">
<img src="doc/img/install-vm.png" width="680">
如果单机版本的时序数据库(比如 Prometheus 性能有瓶颈或容灾较差,我们推荐使用 [VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics)VictoriaMetrics 架构较为简单性能优异易于部署和运维架构图如上。VictoriaMetrics 更详尽的文档,还请参考其[官网](https://victoriametrics.com/)。
如果单机版本的 Prometheus 性能不够或容灾较差,我们推荐使用 [VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics)VictoriaMetrics 架构较为简单性能优异易于部署和运维架构图如上。VictoriaMetrics 更详尽的文档,还请参考其[官网](https://victoriametrics.com/)。
## Community
开源项目要更有生命力,离不开开放的治理架构和源源不断的开发者和用户共同参与,我们致力于建立开放、中立的开源治理架构,吸纳更多来自企业、高校等各方面对云原生监控感兴趣、有热情的开发者,一起打造有活力的夜莺开源社区。关于《夜莺开源项目和社区治理架构(草案)》,请查阅 [COMMUNITY GOVERNANCE](./doc/community-governance.md).
开源项目要更有生命力,离不开开放的治理架构和源源不断的开发者和用户共同参与,我们致力于建立开放、中立的开源治理架构,吸纳更多来自企业、高校等各方面对云原生监控感兴趣、有热情的开发者,一起打造有活力的夜莺开源社区。关于《夜莺开源项目和社区治理架构(草案)》,请查阅 **[COMMUNITY GOVERNANCE](./doc/community-governance.md)**.
**我们欢迎您以各种方式参与到夜莺开源项目和开源社区中来,工作包括不限于**
- 补充和完善文档 => [n9e.github.io](https://n9e.github.io/)
@@ -95,8 +96,9 @@
**尊重、认可和记录每一位贡献者的工作**是夜莺开源社区的第一指导原则,我们提倡**高效的提问**,这既是对开发者时间的尊重,也是对整个社区知识沉淀的贡献:
- 提问之前请先查阅 [FAQ](https://www.gitlink.org.cn/ccfos/nightingale/wiki/faq)
- 我们使用[GitHub Discussions](https://github.com/ccfos/nightingale/discussions)作为交流论坛,有问题可以到这里搜索、提问
- 我们也推荐你加入微信群,和其他夜莺用户交流经验 (请先加好友:[UlricGO](https://www.gitlink.org.cn/UlricQin/gist/tree/master/self.jpeg) 备注:夜莺加群+姓名+公司)
- 提问之前请先搜索 [github issue](https://github.com/ccfos/nightingale/issues)
- 我们优先推荐通过提交 github issue 来提问,如果[有问题点击这里](https://github.com/ccfos/nightingale/issues/new?assignees=&labels=kind%2Fbug&template=bug_report.yml) | [有需求建议点击这里](https://github.com/ccfos/nightingale/issues/new?assignees=&labels=kind%2Ffeature&template=enhancement.md)
- 最后,我们推荐你加入微信群,针对相关开放式问题,相互交流咨询 (请先加好友:[UlricGO](https://www.gitlink.org.cn/UlricQin/gist/tree/master/self.jpeg) 备注:夜莺加群+姓名+公司,交流群里会有开发者团队和专业、热心的群友回答问题)
## Who is using
@@ -117,4 +119,4 @@
## Contact Us
推荐您关注夜莺监控公众号,及时获取相关产品和社区动态:
<img src="doc/img/n9e-vx-new.png" width="120">
<img src="doc/img/n9e-vx-new.png" width="180">

View File

@@ -1,7 +0,0 @@
## Active Contributors
- [xiaoziv](https://github.com/xiaoziv)
- [tanxiao1990](https://github.com/tanxiao1990)
- [bbaobelief](https://github.com/bbaobelief)
- [freedomkk-qfeng](https://github.com/freedomkk-qfeng)
- [lsy1990](https://github.com/lsy1990)

View File

@@ -1,5 +0,0 @@
## Committers
- [YeningQin](https://github.com/710leo)
- [FeiKong](https://github.com/kongfei605)
- [XiaqingDai](https://github.com/jsers)

View File

@@ -1,36 +1,29 @@
[夜莺监控](https://github.com/ccfos/nightingale "夜莺监控")是一款开源云原生监控系统由滴滴设计开发2020 年 3 月份开源之后,凭借其优秀的产品设计、灵活性架构和明确清晰的定位,夜莺监控快速发展为国内最活跃的企业级云原生监控方案。[截止当前](具体指2022年8月 "截止当前"),在 [Github](https://github.com/ccfos/nightingale "Github") 上已经迭代发布了 **70** 多个版本,获得了 **5K** 多个 Star**80** 多位代码贡献者。快速的迭代,也让夜莺监控的用户群越来越大,涉及各行各业。
# 夜莺开源项目和社区治理架构(草案)
更进一步,夜莺监控于 2022 年 5 月 11 日,正式捐赠予中国计算机学会开源发展委员会 [CCF ODC](https://www.ccf.org.cn/kyfzwyh/ "CCF ODC"),为 CCF ODC 成立后接受捐赠的第一个开源项目。
## 社区架构
开源项目要更有生命力,离不开开放的治理架构和源源不断的开发者共同参与。夜莺监控项目加入 CCF 开源大家庭后,能在计算机学会的支持和带动下,进一步结合云原生、可观测、国产化等多个技术发展的需求,建立开放、中立的开源治理架构,打造更专业、有活力的开发者社区。
### 用户(User)
**今天,我们郑重发布夜莺监控开源社区治理架构,并公示相关的任命和社区荣誉,期待开源的道路上,一起同行。**
> 欢迎任何个人、公司以及组织,使用夜莺监控,并积极的反馈 bug、提交功能需求、以及相互帮助我们推荐使用 [github issue](https://github.com/ccfos/nightingale/issues) 来跟踪 bug 和管理需求。
# 夜莺监控开源社区架构
社区用户,可以通过在 **[Who is Using Nightingale](https://github.com/ccfos/nightingale/issues/897)** 登记您的使用情况,并分享您使用夜莺监控的经验,将会自动进入 **[END USERS](./end-users.md)** 列表,并获得社区的 **VIP Support**
### User|用户
### 贡献者(Contributer)
> 欢迎任何个人、公司以及组织,使用夜莺监控,并积极的反馈 bug、提交功能需求、以及相互帮助我们推荐使用 [Github Issue](https://github.com/ccfos/nightingale/issues "Github Issue") 来跟踪 bug 和管理需求。
> 欢迎每一位用户,包括但不限于以下列方式参与到夜莺开源社区并做出贡献:
社区用户,可以通过在 **[Who is Using Nightingale](https://github.com/ccfos/nightingale/issues/897 "Who is Using Nightingale")** 登记您的使用情况,并分享您使用夜莺监控的经验,将会自动进入 **[END USERS](https://github.com/ccfos/nightingale/blob/main/doc/end-users.md "END USERS")** 文件列表,并获得社区的 **VIP Support**
### Contributor|贡献者
> 欢迎每一位用户,包括但不限于以下方式参与到夜莺开源社区并做出贡献:
1. 在 [Github Issue](https://github.com/ccfos/nightingale/issues "Github Issue") 中积极参与讨论,参与社区活动;
1. 在 [github issue](https://github.com/ccfos/nightingale/issues) 中积极参与讨论,参与社区活动;
1. 提交代码补丁;
1. 翻译、修订、补充和完善[文档](https://n9e.github.io "文档")
1. 翻译、修订、补充和完善[文档](https://n9e.github.io)
1. 分享夜莺监控的使用经验,积极布道;
1. 提交建议 / 批评;
年度累计向 [CCFOS/NIGHTINGALE](https://github.com/ccfos/nightingale "CCFOS/NIGHTINGALE") 提交 **5** 个PR被合并或者因为其他贡献被**项目管委会**一致认可,将会自动进入到 **[ACTIVE CONTRIBUTORS](https://github.com/ccfos/nightingale/blob/main/doc/active-contributors.md "ACTIVE CONTRIBUTORS")** 列表,并获得夜莺开源社区颁发的证书,享有夜莺开源社区一定的权益和福利。
年度累计向 [CCFOS/NIGHTINGALE](https://github.com/ccfos/nightingale) 提交 **5** 个PR被合并或者因为其他贡献被**项目管委会**一致认可,将会自动进入到 **[ACTIVE CONTRIBUTORS](./active-contributors.md)** 列表,并获得 **[CCF ODC](https://www.ccf.org.cn/kyfzwyh/)** 颁发的电子证书,享有夜莺开源社区一定的权益和福利。
所有向 [CCFOS/NIGHTINGALE](https://github.com/ccfos/nightingale "CCFOS/NIGHTINGALE") 提交过PR被合并或者做出过重要贡献的 Contributor都会被永久记载于 [CONTRIBUTORS](https://github.com/ccfos/nightingale/blob/main/doc/contributors.md "CONTRIBUTORS") 列表。
### Committer|提交者
### 提交者(Committer)
> Committer 是指拥有 [CCFOS/NIGHTINGALE](https://github.com/ccfos/nightingale "CCFOS/NIGHTINGALE") 代码仓库写操作权限的贡献者。原则上 Committer 能够自主决策某个代码补丁是否可以合入到夜莺代码仓库,但是项目管委会拥有最终的决策权。
> Committer 是指拥有 [CCFOS/NIGHTINGALE](https://github.com/ccfos/nightingale) 代码仓库写操作权限的贡献者,他们拥有 ccf.org.cn 为后缀的邮箱地址(待上线)。原则上 Committer 能够自主决策某个代码补丁是否可以合入到夜莺代码仓库,但是项目管委会拥有最终的决策权。
Committer 承担以下一个或多个职责:
- 积极回应 Issues
@@ -38,43 +31,44 @@ Committer 承担以下一个或多个职责:
- 参加开发者例行会议,积极讨论项目规划和技术方案;
- 代表夜莺开源社区出席相关技术会议并做演讲;
Committer 记录并公示于 **[COMMITTERS](https://github.com/ccfos/nightingale/blob/main/doc/committers.md "COMMITTERS")** 列表,并获得夜莺开源社区颁发的证书,以及享有夜莺开源社区的各种权益和福利。
Committer 记录并公示于 **[COMMITTERS](./committers.md)** 列表,并获得 **[CCF ODC](https://www.ccf.org.cn/kyfzwyh/)** 颁发的电子证书,以及享有夜莺开源社区的各种权益和福利。
### PMC|项目管委会
### 项目管委会成员(PMC Member)
> PMC项目管委会)作为一个实体,来管理和领导夜莺项目,为整个项目的发展全权负责。项目管委会相关内容记录并公示于文件[PMC](https://github.com/ccfos/nightingale/blob/main/doc/pmc.md "PMC").
> 项目管委会成员,从 Contributor 或者 Committer 中选举产生,他们拥有 [CCFOS/NIGHTINGALE](https://github.com/ccfos/nightingale) 代码仓库的写操作权限,拥有 ccf.org.cn 为后缀的邮箱地址(待上线),拥有 Nightingale 社区相关事务的投票权、以及提名 Committer 候选人的权利。 项目管委会作为一个实体,为整个项目的发展全权负责。项目管委会成员记录并公示于 **[PMC](./pmc.md)** 列表。
- 项目管委会成员(PMC Member),从 Contributor 或者 Committer 中选举产生,他们拥有 [CCFOS/NIGHTINGALE](https://github.com/ccfos/nightingale "CCFOS/NIGHTINGALE") 代码仓库的写操作权限,拥有 Nightingale 社区相关事务的投票权、以及提名 Committer 候选人的权利。
- 项目管委会主席(PMC Chair),从项目管委会成员中投票产生。管委会主席是 **[CCF ODC](https://www.ccf.org.cn/kyfzwyh/ "CCF ODC")** 和项目管委会之间的沟通桥梁,履行特定的项目管理职责。
### 项目管委会主席(PMC Chair)
## Communication|沟通机制
> 项目管委会主席采用任命制,由 **[CCF ODC](https://www.ccf.org.cn/kyfzwyh/)** 从项目管委会成员中任命产生。项目管委会作为一个统一的实体,来管理和领导夜莺项目。管委会主席是 CCF ODC 和项目管委会之间的沟通桥梁,履行特定的项目管理职责。
## 沟通机制(Communication)
1. 我们推荐使用邮件列表来反馈建议(待发布);
2. 我们推荐使用 [Github Issue](https://github.com/ccfos/nightingale/issues "Github Issue") 跟踪 bug 和管理需求;
3. 我们推荐使用 [Github Milestone](https://github.com/ccfos/nightingale/milestones "Github Milestone") 来管理项目进度和规划;
2. 我们推荐使用 [github issue](https://github.com/ccfos/nightingale/issues) 跟踪 bug 和管理需求;
3. 我们推荐使用 [github milestone](https://github.com/ccfos/nightingale/milestones) 来管理项目进度和规划;
4. 我们推荐使用腾讯会议来定期召开项目例会(会议 ID 待发布);
## Documentation|文档
1. 我们推荐使用 [Github Pages](https://n9e.github.io "Github Pages") 来沉淀文档;
2. 我们推荐使用 [Gitlink Wiki](https://www.gitlink.org.cn/ccfos/nightingale/wiki/faq "Gitlink Wiki") 来沉淀 FAQ
## 文档(Documentation)
1. 我们推荐使用 [github pages](https://n9e.github.io) 来沉淀文档;
2. 我们推荐使用 [gitlink wiki](https://www.gitlink.org.cn/ccfos/nightingale/wiki/faq) 来沉淀FAQ
## Operation|运营机制
## 运营机制(Operation)
1. 我们定期组织用户、贡献者、项目管委会成员之间的沟通会议,讨论项目开发的目标、方案、进度,以及讨论相关需求的合理性、优先级等议题;
2. 我们定期组织 meetup (线上&线下),创造良好的用户交流分享环境,并沉淀相关内容到文档站点;
3. 我们定期组织夜莺开发者大会,分享 [best user story](https://n9e.github.io/docs/prologue/share/ "best user story")、同步年度开发目标和计划、讨论新技术方向等;
3. 我们定期组织夜莺开发者大会,分享 best user story、同步年度开发目标和计划、讨论新技术方向等
## Philosophy|社区指导原则
## 社区指导原则(Philosophy)
>尊重、认可和记录每一位贡献者的工作。
**尊重、认可和记录每一位贡献者的工作。**
## 关于提问的原则
按照**尊重、认可、记录每一位贡献者的工作**原则,我们提倡**高效的提问**,这既是对开发者时间的尊重,也是对整个社区的知识沉淀的贡献:
1. 提问之前请先查阅 [FAQ](https://www.gitlink.org.cn/ccfos/nightingale/wiki/faq "FAQ")
2. 提问之前请先搜索 [Github Issues](https://github.com/ccfos/nightingale/issues "Github Issue")
3. 我们优先推荐通过提交 [Github Issue](https://github.com/ccfos/nightingale/issues "Github Issue") 来提问,如果[有问题点击这里](https://github.com/ccfos/nightingale/issues/new?assignees=&labels=kind%2Fbug&template=bug_report.yml "有问题点击这里") | [有需求建议点击这里](https://github.com/ccfos/nightingale/issues/new?assignees=&labels=kind%2Ffeature&template=enhancement.md "有需求建议点击这里")
1. 提问之前请先查阅 [FAQ](https://www.gitlink.org.cn/ccfos/nightingale/wiki/faq)
2. 提问之前请先搜索 [github issue](https://github.com/ccfos/nightingale/issues)
3. 我们优先推荐通过提交 github issue 来提问,如果[有问题点击这里](https://github.com/ccfos/nightingale/issues/new?assignees=&labels=kind%2Fbug&template=bug_report.yml) | [有需求建议点击这里](https://github.com/ccfos/nightingale/issues/new?assignees=&labels=kind%2Ffeature&template=enhancement.md)
最后,我们推荐你加入微信群,针对相关开放式问题,相互交流咨询 (请先加好友:[UlricGO](https://www.gitlink.org.cn/UlricQin/gist/tree/master/self.jpeg "UlricGO") 备注:夜莺加群+姓名+公司,交流群里会有开发者团队和专业、热心的群友回答问题)
最后,我们推荐你加入微信群,针对相关开放式问题,相互交流咨询 (请先加好友:[UlricGO](https://www.gitlink.org.cn/UlricQin/gist/tree/master/self.jpeg) 备注:夜莺加群+姓名+公司,交流群里会有开发者团队和专业、热心的群友回答问题)

View File

@@ -1,5 +0,0 @@
## Contributors
<a href="https://github.com/ccfos/nightingale/graphs/contributors">
<img src="https://contrib.rocks/image?repo=ccfos/nightingale" />
</a>

View File

@@ -1,5 +0,0 @@
## End Users
- [中移动](https://github.com/ccfos/nightingale/issues/897#issuecomment-1086573166)
- [inke](https://github.com/ccfos/nightingale/issues/897#issuecomment-1099840636)
- [方正证券](https://github.com/ccfos/nightingale/issues/897#issuecomment-1110492461)

View File

@@ -1,7 +0,0 @@
### PMC Chair
- [laiwei](https://github.com/laiwei)
### PMC Co-Chair
- [UlricQin](https://github.com/UlricQin)
### PMC Member

View File

@@ -1,5 +1,4 @@
FROM python:2.7.8-slim
#FROM python:2
FROM python:2
#FROM ubuntu:21.04
WORKDIR /app

View File

@@ -1,4 +1,4 @@
FROM --platform=$BUILDPLATFORM python:2.7.8-slim
FROM --platform=$BUILDPLATFORM python:2
WORKDIR /app

View File

@@ -43,9 +43,3 @@ basic_auth_pass = ""
timeout = 5000
dial_timeout = 2500
max_idle_conns_per_host = 100
[http]
enable = false
address = ":9100"
print_access = false
run_mode = "release"

View File

@@ -80,7 +80,7 @@ services:
sh -c "/wait && /app/ibex server"
nwebapi:
image: flashcatcloud/nightingale:latest
image: ulric2019/nightingale:5.9.4
container_name: nwebapi
hostname: nwebapi
restart: always
@@ -108,7 +108,7 @@ services:
sh -c "/wait && /app/n9e webapi"
nserver:
image: flashcatcloud/nightingale:latest
image: ulric2019/nightingale:5.9.4
container_name: nserver
hostname: nserver
restart: always
@@ -136,7 +136,7 @@ services:
sh -c "/wait && /app/n9e server"
categraf:
image: "flashcatcloud/categraf:latest"
image: "flashcatcloud/categraf:v0.1.9"
container_name: "categraf"
hostname: "categraf01"
restart: always
@@ -150,7 +150,7 @@ services:
- /:/hostfs
- /var/run/docker.sock:/var/run/docker.sock
ports:
- "9100:9100/tcp"
- "8094:8094/tcp"
networks:
- nightingale
depends_on:

View File

@@ -41,12 +41,10 @@ CREATE TABLE `user_group` (
insert into user_group(id, name, create_at, create_by, update_at, update_by) values(1, 'demo-root-group', unix_timestamp(now()), 'root', unix_timestamp(now()), 'root');
CREATE TABLE `user_group_member` (
`id` bigint unsigned not null auto_increment,
`group_id` bigint unsigned not null,
`user_id` bigint unsigned not null,
KEY (`group_id`),
KEY (`user_id`),
PRIMARY KEY(`id`)
KEY (`user_id`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4;
insert into user_group_member(group_id, user_id) values(1, 1);
@@ -54,7 +52,7 @@ insert into user_group_member(group_id, user_id) values(1, 1);
CREATE TABLE `configs` (
`id` bigint unsigned not null auto_increment,
`ckey` varchar(191) not null,
`cval` varchar(4096) not null default '',
`cval` varchar(1024) not null default '',
PRIMARY KEY (`id`),
UNIQUE KEY (`ckey`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4;
@@ -72,12 +70,10 @@ insert into `role`(name, note) values('Standard', 'Ordinary user role');
insert into `role`(name, note) values('Guest', 'Readonly user role');
CREATE TABLE `role_operation`(
`id` bigint unsigned not null auto_increment,
`role_name` varchar(128) not null,
`operation` varchar(191) not null,
KEY (`role_name`),
KEY (`operation`),
PRIMARY KEY(`id`)
KEY (`operation`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4;
-- Admin is special, who has no concrete operation but can do anything.
@@ -165,16 +161,13 @@ CREATE TABLE `board` (
`id` bigint unsigned not null auto_increment,
`group_id` bigint not null default 0 comment 'busi group id',
`name` varchar(191) not null,
`ident` varchar(200) not null default '',
`tags` varchar(255) not null comment 'split by space',
`public` tinyint(1) not null default 0 comment '0:false 1:true',
`create_at` bigint not null default 0,
`create_by` varchar(64) not null default '',
`update_at` bigint not null default 0,
`update_by` varchar(64) not null default '',
PRIMARY KEY (`id`),
UNIQUE KEY (`group_id`, `name`),
KEY(`ident`)
UNIQUE KEY (`group_id`, `name`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4;
-- for dashboard new version
@@ -233,7 +226,6 @@ CREATE TABLE `chart_share` (
CREATE TABLE `alert_rule` (
`id` bigint unsigned not null auto_increment,
`group_id` bigint not null default 0 comment 'busi group id',
`cate` varchar(128) not null,
`cluster` varchar(128) not null,
`name` varchar(255) not null,
`note` varchar(1024) not null default '',
@@ -272,18 +264,13 @@ CREATE TABLE `alert_mute` (
`id` bigint unsigned not null auto_increment,
`group_id` bigint not null default 0 comment 'busi group id',
`prod` varchar(255) not null default '',
`note` varchar(1024) not null default '',
`cate` varchar(128) not null,
`cluster` varchar(128) not null,
`tags` varchar(4096) not null default '' comment 'json,map,tagkey->regexp|value',
`cause` varchar(255) not null default '',
`btime` bigint not null default 0 comment 'begin time',
`etime` bigint not null default 0 comment 'end time',
`disabled` tinyint(1) not null default 0 comment '0:enabled 1:disabled',
`create_at` bigint not null default 0,
`create_by` varchar(64) not null default '',
`update_at` bigint not null default 0,
`update_by` varchar(64) not null default '',
PRIMARY KEY (`id`),
KEY (`create_at`),
KEY (`group_id`)
@@ -291,10 +278,7 @@ CREATE TABLE `alert_mute` (
CREATE TABLE `alert_subscribe` (
`id` bigint unsigned not null auto_increment,
`name` varchar(255) not null default '',
`disabled` tinyint(1) not null default 0 comment '0:enabled 1:disabled',
`group_id` bigint not null default 0 comment 'busi group id',
`cate` varchar(128) not null,
`cluster` varchar(128) not null,
`rule_id` bigint not null default 0,
`tags` varchar(4096) not null default '' comment 'json,map,tagkey->regexp|value',
@@ -366,7 +350,7 @@ CREATE TABLE `recording_rule` (
`cluster` varchar(128) not null,
`name` varchar(255) not null comment 'new metric name',
`note` varchar(255) not null comment 'rule note',
`disabled` tinyint(1) not null default 0 comment '0:enabled 1:disabled',
`disabled` tinyint(1) not null comment '0:enabled 1:disabled',
`prom_ql` varchar(8192) not null comment 'promql',
`prom_eval_interval` int not null comment 'evaluate interval',
`append_tags` varchar(255) default '' comment 'split by space: service=n9e mod=api',
@@ -396,7 +380,6 @@ insert into alert_aggr_view(name, rule, cate) values('By RuleName', 'field:rule_
CREATE TABLE `alert_cur_event` (
`id` bigint unsigned not null comment 'use alert_his_event.id',
`cate` varchar(128) not null,
`cluster` varchar(128) not null,
`group_id` bigint unsigned not null comment 'busi group id of rule',
`group_name` varchar(255) not null default '' comment 'busi group name',
@@ -419,7 +402,6 @@ CREATE TABLE `alert_cur_event` (
`notify_cur_number` int not null default 0 comment '',
`target_ident` varchar(191) not null default '' comment 'target ident, also in tags',
`target_note` varchar(191) not null default '' comment 'target note',
`first_trigger_time` bigint,
`trigger_time` bigint not null,
`trigger_value` varchar(255) not null,
`tags` varchar(1024) not null default '' comment 'merge data_tags rule_tags, split by ,,',
@@ -433,7 +415,6 @@ CREATE TABLE `alert_cur_event` (
CREATE TABLE `alert_his_event` (
`id` bigint unsigned not null AUTO_INCREMENT,
`is_recovered` tinyint(1) not null,
`cate` varchar(128) not null,
`cluster` varchar(128) not null,
`group_id` bigint unsigned not null comment 'busi group id of rule',
`group_name` varchar(255) not null default '' comment 'busi group name',
@@ -455,7 +436,6 @@ CREATE TABLE `alert_his_event` (
`notify_cur_number` int not null default 0 comment '',
`target_ident` varchar(191) not null default '' comment 'target ident, also in tags',
`target_note` varchar(191) not null default '' comment 'target note',
`first_trigger_time` bigint,
`trigger_time` bigint not null,
`trigger_value` varchar(255) not null,
`recover_time` bigint not null default 0,
@@ -518,12 +498,3 @@ CREATE TABLE `task_record`
KEY (`create_at`, `group_id`),
KEY (`create_by`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4;
CREATE TABLE `alerting_engines`
(
`id` int unsigned NOT NULL AUTO_INCREMENT,
`instance` varchar(128) not null default '' comment 'instance identification, e.g. 10.9.0.9:9090',
`cluster` varchar(128) not null default '' comment 'target reader cluster',
`clock` bigint not null,
PRIMARY KEY (`id`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4;

View File

@@ -43,11 +43,9 @@ CREATE INDEX user_group_update_at_idx ON user_group (update_at);
insert into user_group(id, name, create_at, create_by, update_at, update_by) values(1, 'demo-root-group', date_part('epoch',current_timestamp)::int, 'root', date_part('epoch',current_timestamp)::int, 'root');
CREATE TABLE user_group_member (
id bigserial,
group_id bigint not null,
user_id bigint not null
) ;
ALTER TABLE user_group_member ADD CONSTRAINT user_group_member_pk PRIMARY KEY (id);
CREATE INDEX user_group_member_group_id_idx ON user_group_member (group_id);
CREATE INDEX user_group_member_user_id_idx ON user_group_member (user_id);
@@ -56,7 +54,7 @@ insert into user_group_member(group_id, user_id) values(1, 1);
CREATE TABLE configs (
id bigserial,
ckey varchar(191) not null,
cval varchar(4096) not null default ''
cval varchar(1024) not null default ''
) ;
ALTER TABLE configs ADD CONSTRAINT configs_pk PRIMARY KEY (id);
ALTER TABLE configs ADD CONSTRAINT configs_un UNIQUE (ckey);
@@ -74,11 +72,9 @@ insert into role(name, note) values('Standard', 'Ordinary user role');
insert into role(name, note) values('Guest', 'Readonly user role');
CREATE TABLE role_operation(
id bigserial,
role_name varchar(128) not null,
operation varchar(191) not null
) ;
ALTER TABLE role_operation ADD CONSTRAINT role_operation_pk PRIMARY KEY (id);
CREATE INDEX role_operation_role_name_idx ON role_operation (role_name);
CREATE INDEX role_operation_operation_idx ON role_operation (operation);
@@ -198,9 +194,7 @@ CREATE TABLE board (
id bigserial not null ,
group_id bigint not null default 0 ,
name varchar(191) not null,
ident varchar(200) not null default '',
tags varchar(255) not null ,
public smallint not null default 0,
create_at bigint not null default 0,
create_by varchar(64) not null default '',
update_at bigint not null default 0,
@@ -210,8 +204,6 @@ ALTER TABLE board ADD CONSTRAINT board_pk PRIMARY KEY (id);
ALTER TABLE board ADD CONSTRAINT board_un UNIQUE (group_id,"name");
COMMENT ON COLUMN board.group_id IS 'busi group id';
COMMENT ON COLUMN board.tags IS 'split by space';
COMMENT ON COLUMN board.public IS '0:false 1:true';
CREATE INDEX board_ident_idx ON board (ident);
-- for dashboard new version
CREATE TABLE board_payload (
@@ -267,7 +259,6 @@ CREATE INDEX chart_share_create_at_idx ON chart_share (create_at);
CREATE TABLE alert_rule (
id bigserial NOT NULL,
group_id int8 NOT NULL DEFAULT 0,
cate varchar(128) not null default '' ,
"cluster" varchar(128) NOT NULL,
"name" varchar(255) NOT NULL,
note varchar(1024) NOT NULL,
@@ -323,19 +314,14 @@ COMMENT ON COLUMN alert_rule.append_tags IS 'split by space: service=n9e mod=api
CREATE TABLE alert_mute (
id bigserial,
group_id bigint not null default 0 ,
cate varchar(128) not null default '' ,
prod varchar(255) NOT NULL DEFAULT '' ,
note varchar(1024) not null default '',
cluster varchar(128) not null,
tags varchar(4096) not null default '' ,
cause varchar(255) not null default '',
btime bigint not null default 0 ,
etime bigint not null default 0 ,
disabled smallint not null default 0 ,
create_at bigint not null default 0,
create_by varchar(64) not null default '',
update_at bigint not null default 0,
update_by varchar(64) not null default ''
create_by varchar(64) not null default ''
) ;
ALTER TABLE alert_mute ADD CONSTRAINT alert_mute_pk PRIMARY KEY (id);
CREATE INDEX alert_mute_create_at_idx ON alert_mute (create_at);
@@ -344,14 +330,10 @@ COMMENT ON COLUMN alert_mute.group_id IS 'busi group id';
COMMENT ON COLUMN alert_mute.tags IS 'json,map,tagkey->regexp|value';
COMMENT ON COLUMN alert_mute.btime IS 'begin time';
COMMENT ON COLUMN alert_mute.etime IS 'end time';
COMMENT ON COLUMN alert_mute.disabled IS '0:enabled 1:disabled';
CREATE TABLE alert_subscribe (
id bigserial,
"name" varchar(255) NOT NULL default '',
disabled int2 NOT NULL default 0 ,
group_id bigint not null default 0 ,
cate varchar(128) not null default '' ,
cluster varchar(128) not null,
rule_id bigint not null default 0,
tags jsonb not null ,
@@ -368,7 +350,6 @@ CREATE TABLE alert_subscribe (
ALTER TABLE alert_subscribe ADD CONSTRAINT alert_subscribe_pk PRIMARY KEY (id);
CREATE INDEX alert_subscribe_group_id_idx ON alert_subscribe (group_id);
CREATE INDEX alert_subscribe_update_at_idx ON alert_subscribe (update_at);
COMMENT ON COLUMN alert_subscribe.disabled IS '0:enabled 1:disabled';
COMMENT ON COLUMN alert_subscribe.group_id IS 'busi group id';
COMMENT ON COLUMN alert_subscribe.tags IS 'json,map,tagkey->regexp|value';
COMMENT ON COLUMN alert_subscribe.redefine_severity IS 'is redefine severity?';
@@ -435,7 +416,6 @@ insert into alert_aggr_view(name, rule, cate) values('By RuleName', 'field:rule_
CREATE TABLE alert_cur_event (
id bigserial NOT NULL,
cate varchar(128) not null default '' ,
"cluster" varchar(128) NOT NULL,
group_id int8 NOT NULL,
group_name varchar(255) NOT NULL DEFAULT ''::character varying,
@@ -456,7 +436,6 @@ CREATE TABLE alert_cur_event (
notify_cur_number int4 not null default 0,
target_ident varchar(191) NOT NULL DEFAULT ''::character varying,
target_note varchar(191) NOT NULL DEFAULT ''::character varying,
first_trigger_time int8,
trigger_time int8 NOT NULL,
trigger_value varchar(255) NOT NULL,
tags varchar(1024) NOT NULL DEFAULT ''::character varying,
@@ -489,7 +468,6 @@ COMMENT ON COLUMN alert_cur_event.tags IS 'merge data_tags rule_tags, split by ,
CREATE TABLE alert_his_event (
id bigserial NOT NULL,
is_recovered int2 NOT NULL,
cate varchar(128) not null default '' ,
"cluster" varchar(128) NOT NULL,
group_id int8 NOT NULL,
group_name varchar(255) NOT NULL DEFAULT ''::character varying,
@@ -509,7 +487,6 @@ CREATE TABLE alert_his_event (
notify_cur_number int4 not null default 0,
target_ident varchar(191) NOT NULL DEFAULT ''::character varying,
target_note varchar(191) NOT NULL DEFAULT ''::character varying,
first_trigger_time int8,
trigger_time int8 NOT NULL,
trigger_value varchar(255) NOT NULL,
recover_time int8 NOT NULL DEFAULT 0,
@@ -601,14 +578,3 @@ CREATE INDEX task_record_create_by_idx ON task_record (create_by);
COMMENT ON COLUMN task_record.id IS 'ibex task id';
COMMENT ON COLUMN task_record.group_id IS 'busi group id';
CREATE TABLE alerting_engines
(
id bigserial NOT NULL,
instance varchar(128) not null default '',
cluster varchar(128) not null default '',
clock bigint not null
) ;
ALTER TABLE alerting_engines ADD CONSTRAINT alerting_engines_pk PRIMARY KEY (id);
COMMENT ON COLUMN alerting_engines.instance IS 'instance identification, e.g. 10.9.0.9:9090';
COMMENT ON COLUMN alerting_engines.cluster IS 'target reader cluster';

View File

@@ -23,11 +23,6 @@ class Sender(object):
def send_feishu(cls, payload):
# already done in go code
pass
@classmethod
def send_mm(cls, payload):
# already done in go code
pass
@classmethod
def send_sms(cls, payload):

View File

@@ -29,7 +29,6 @@ func (n *N9EPlugin) Notify(bs []byte) {
"dingtalk_robot_token",
"wecom_robot_token",
"feishu_robot_token",
"telegram_robot_token",
}
for _, ch := range channels {
if ret := gjson.GetBytes(bs, ch); ret.Exists() {

View File

@@ -9,12 +9,7 @@ ClusterName = "Default"
BusiGroupLabelKey = "busigroup"
# sleep x seconds, then start judge engine
EngineDelay = 60
DisableUsageReport = true
# config | database
ReaderFrom = "config"
EngineDelay = 120
[Log]
# log write dir
@@ -73,12 +68,10 @@ InsecureSkipVerify = true
Batch = 5
[Alerting]
# timeout settings, unit: ms, default: 30000ms
Timeout=30000
TemplatesDir = "./etc/template"
NotifyConcurrency = 10
# use builtin go code notify
NotifyBuiltinChannels = ["email", "dingtalk", "wecom", "feishu", "mm", "telegram"]
NotifyBuiltinChannels = ["email", "dingtalk", "wecom", "feishu"]
[Alerting.CallScript]
# built in sending capability in go code
@@ -90,8 +83,7 @@ ScriptPath = "./etc/script/notify.py"
Enable = false
# use a plugin via `go build -buildmode=plugin -o notify.so`
PluginPath = "./etc/script/notify.so"
# The first letter must be capitalized to be exported
Caller = "N9eCaller"
Caller = "n9eCaller"
[Alerting.RedisPub]
Enable = false
@@ -109,7 +101,7 @@ Headers = ["Content-Type", "application/json", "X-From", "N9E"]
[NoData]
Metric = "target_up"
# unit: second
Interval = 120
Interval = 15
[Ibex]
# callback: ${ibex}/${tplid}/${host}
@@ -144,7 +136,7 @@ MaxIdleConns = 50
# table prefix
TablePrefix = ""
# enable auto migrate or not
# EnableAutoMigrate = false
EnableAutoMigrate = false
[Reader]
# prometheus base url
@@ -155,18 +147,23 @@ BasicAuthUser = ""
BasicAuthPass = ""
# timeout settings, unit: ms
Timeout = 30000
DialTimeout = 3000
MaxIdleConnsPerHost = 100
DialTimeout = 10000
TLSHandshakeTimeout = 30000
ExpectContinueTimeout = 1000
IdleConnTimeout = 90000
# time duration, unit: ms
KeepAlive = 30000
MaxConnsPerHost = 0
MaxIdleConns = 100
MaxIdleConnsPerHost = 10
[WriterOpt]
# queue channel count
QueueCount = 1000
QueueCount = 100
# queue max size
QueueMaxSize = 1000000
QueueMaxSize = 200000
# once pop samples number from queue
QueuePopSize = 1000
# metric or ident
ShardingKey = "ident"
QueuePopSize = 2000
[[Writers]]
Url = "http://prometheus:9090/api/v1/write"
@@ -175,8 +172,8 @@ BasicAuthUser = ""
# Basic auth password
BasicAuthPass = ""
# timeout settings, unit: ms
Timeout = 10000
DialTimeout = 3000
Timeout = 30000
DialTimeout = 10000
TLSHandshakeTimeout = 30000
ExpectContinueTimeout = 1000
IdleConnTimeout = 90000
@@ -185,12 +182,6 @@ KeepAlive = 30000
MaxConnsPerHost = 0
MaxIdleConns = 100
MaxIdleConnsPerHost = 100
# [[Writers.WriteRelabels]]
# Action = "replace"
# SourceLabels = ["__address__"]
# Regex = "([^:]+)(?::\\d+)?"
# Replacement = "$1:80"
# TargetLabel = "__address__"
# [[Writers]]
# Url = "http://m3db:7201/api/v1/prom/remote/write"

View File

@@ -1,7 +0,0 @@
**级别状态**: {{if .IsRecovered}}<font color="info">S{{.Severity}} Recovered</font>{{else}}<font color="warning">S{{.Severity}} Triggered</font>{{end}}
**规则标题**: {{.RuleName}}{{if .RuleNote}}
**规则备注**: {{.RuleNote}}{{end}}
**监控指标**: {{.TagsJSON}}
{{if .IsRecovered}}**恢复时间**{{timeformat .LastEvalTime}}{{else}}**触发时间**: {{timeformat .TriggerTime}}
**触发时值**: {{.TriggerValue}}{{end}}
**发送时间**: {{timestamp}}

View File

@@ -4,21 +4,12 @@ RunMode = "release"
# # custom i18n dict config
# I18N = "./etc/i18n.json"
# # custom i18n request header key
# I18NHeaderKey = "X-Language"
# metrics descriptions
MetricsYamlFile = "./etc/metrics.yaml"
BuiltinAlertsDir = "./etc/alerts"
BuiltinDashboardsDir = "./etc/dashboards"
# config | api
ClustersFrom = "config"
# using when ClustersFrom = "api"
ClustersFromAPIs = []
[[NotifyChannels]]
Label = "邮箱"
# do not change Key
@@ -39,16 +30,6 @@ Label = "飞书机器人"
# do not change Key
Key = "feishu"
[[NotifyChannels]]
Label = "mm bot"
# do not change Key
Key = "mm"
[[NotifyChannels]]
Label = "telegram机器人"
# do not change Key
Key = "telegram"
[[ContactKeys]]
Label = "Wecom Robot Token"
# do not change Key
@@ -64,16 +45,6 @@ Label = "Feishu Robot Token"
# do not change Key
Key = "feishu_robot_token"
[[ContactKeys]]
Label = "MatterMost Webhook URL"
# do not change Key
Key = "mm_webhook_url"
[[ContactKeys]]
Label = "Telegram Robot Token"
# do not change Key
Key = "telegram_robot_token"
[Log]
# log write dir
Dir = "logs"
@@ -121,13 +92,6 @@ AccessExpired = 1500
RefreshExpired = 10080
RedisKeyPrefix = "/jwt/"
[ProxyAuth]
# if proxy auth enabled, jwt auth is disabled
Enable = false
# username key in http proxy header
HeaderUserNameKey = "X-User-Name"
DefaultRoles = ["Standard"]
[BasicAuth]
user001 = "ccc26da7b9aba533cbb263a36c07dcc5"
@@ -157,20 +121,6 @@ Nickname = "cn"
Phone = "mobile"
Email = "mail"
[OIDC]
Enable = false
RedirectURL = "http://n9e.com/callback"
SsoAddr = "http://sso.example.org"
ClientId = ""
ClientSecret = ""
CoverAttributes = true
DefaultRoles = ["Standard"]
[OIDC.Attributes]
Nickname = "nickname"
Phone = "phone_number"
Email = "email"
[Redis]
# address, ip:port
Address = "redis:6379"
@@ -195,7 +145,7 @@ MaxIdleConns = 50
# table prefix
TablePrefix = ""
# enable auto migrate or not
# EnableAutoMigrate = false
EnableAutoMigrate = false
[[Clusters]]
# Prometheus cluster name
@@ -208,7 +158,14 @@ BasicAuthUser = ""
BasicAuthPass = ""
# timeout settings, unit: ms
Timeout = 30000
DialTimeout = 3000
DialTimeout = 10000
TLSHandshakeTimeout = 30000
ExpectContinueTimeout = 1000
IdleConnTimeout = 90000
# time duration, unit: ms
KeepAlive = 30000
MaxConnsPerHost = 0
MaxIdleConns = 100
MaxIdleConnsPerHost = 100
[Ibex]
@@ -217,10 +174,4 @@ Address = "http://ibex:10090"
BasicAuthUser = "ibex"
BasicAuthPass = "ibex"
# unit: ms
Timeout = 3000
[TargetMetrics]
TargetUp = '''max(max_over_time(target_up{ident=~"(%s)"}[%dm])) by (ident)'''
LoadPerCore = '''max(max_over_time(system_load_norm_1{ident=~"(%s)"}[%dm])) by (ident)'''
MemUtil = '''100-max(max_over_time(mem_available_percent{ident=~"(%s)"}[%dm])) by (ident)'''
DiskUtil = '''max(max_over_time(disk_used_percent{ident=~"(%s)", path="/"}[%dm])) by (ident)'''
Timeout = 3000

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,383 +1,131 @@
zh:
cpu_usage_idle: CPU空闲率(单位:%
cpu_usage_active: CPU使用率(单位:%
cpu_usage_system: CPU内核态时间占比(单位:%
cpu_usage_user: CPU用户态时间占比(单位:%
cpu_usage_nice: 低优先级用户态CPU时间占比也就是进程nice值被调整为1-19之间的CPU时间。这里注意nice可取值范围是-20到19数值越大优先级反而越低(单位:%
cpu_usage_iowait: CPU等待I/O的时间占比(单位:%
cpu_usage_irq: CPU处理中断的时间占比(单位:%
cpu_usage_softirq: CPU处理软中断的时间占比(单位:%
cpu_usage_steal: 在虚拟机环境下有该指标表示CPU被其他虚拟机争用的时间占比超过20就表示争抢严重(单位:%
cpu_usage_guest: 通过虚拟化运行其他操作系统的时间,也就是运行虚拟机的CPU时间占比(单位:%
cpu_usage_guest_nice: 以低优先级运行虚拟机的时间占比(单位:%
cpu_usage_idle: CPU空闲率单位%
cpu_usage_active: CPU使用率(单位:%
cpu_usage_system: CPU内核态时间占比(单位:%
cpu_usage_user: CPU用户态时间占比(单位:%
cpu_usage_nice: 低优先级用户态CPU时间占比也就是进程nice值被调整为1-19之间的CPU时间。这里注意nice可取值范围是-20到19数值越大优先级反而越低(单位:%
cpu_usage_iowait: CPU等待I/O的时间占比(单位:%
cpu_usage_irq: CPU处理硬中断的时间占比(单位:%
cpu_usage_softirq: CPU处理中断的时间占比(单位:%
cpu_usage_steal: 在虚拟机环境下有该指标表示CPU被其他虚拟机争用的时间占比超过20就表示争抢严重(单位:%
cpu_usage_guest: 通过虚拟化运行其他操作系统的时间也就是运行虚拟机的CPU时间占比(单位:%
cpu_usage_guest_nice: 以低优先级运行虚拟机的时间占比(单位:%
disk_free: 硬盘分区剩余量单位byte
disk_used: 硬盘分区使用量单位byte
disk_used_percent: 硬盘分区使用率(单位:%
disk_total: 硬盘分区总量单位byte
disk_inodes_free: 硬盘分区inode剩余量
disk_inodes_used: 硬盘分区inode使用量
disk_inodes_total: 硬盘分区inode总量
disk_free: 硬盘分区剩余量单位byte
disk_used: 硬盘分区使用量单位byte
disk_used_percent: 硬盘分区使用率(单位:%
disk_total: 硬盘分区总量单位byte
disk_inodes_free: 硬盘分区inode剩余量
disk_inodes_used: 硬盘分区inode使用量
disk_inodes_total: 硬盘分区inode总量
diskio_io_time: 从设备视角来看I/O请求总时间队列中有I/O请求就计数单位毫秒counter类型需要用函数求rate才有使用价值
diskio_iops_in_progress: 已经分配给设备驱动且尚未完成的IO请求不包含在队列中但尚未分配给设备驱动的IO请求gauge类型
diskio_merged_reads: 相邻读请求merge读的次数counter类型
diskio_merged_writes: 相邻写请求merge写的次数counter类型
diskio_read_bytes: 读取的byte数量counter类型需要用函数求rate才有使用价值
diskio_read_time: 读请求总时间单位毫秒counter类型需要用函数求rate才有使用价值
diskio_reads: 读请求次数counter类型需要用函数求rate才有使用价值
diskio_weighted_io_time: 从I/O请求视角来看I/O等待总时间如果同时有多个I/O请求时间会叠加单位毫秒
diskio_write_bytes: 写入的byte数量counter类型需要用函数求rate才有使用价值
diskio_write_time: 写请求总时间单位毫秒counter类型需要用函数求rate才有使用价值
diskio_writes: 写请求次数counter类型需要用函数求rate才有使用价值
diskio_io_time: 从设备视角来看I/O请求总时间队列中有I/O请求就计数单位毫秒counter类型需要用函数求rate才有使用价值
diskio_iops_in_progress: 已经分配给设备驱动且尚未完成的IO请求不包含在队列中但尚未分配给设备驱动的IO请求gauge类型
diskio_merged_reads: 相邻读请求merge读的次数counter类型
diskio_merged_writes: 相邻写请求merge写的次数counter类型
diskio_read_bytes: 读取的byte数量counter类型需要用函数求rate才有使用价值
diskio_read_time: 读请求总时间单位毫秒counter类型需要用函数求rate才有使用价值
diskio_reads: 读请求次数counter类型需要用函数求rate才有使用价值
diskio_weighted_io_time: 从I/O请求视角来看I/O等待总时间如果同时有多个I/O请求时间会叠加单位毫秒
diskio_write_bytes: 写入的byte数量counter类型需要用函数求rate才有使用价值
diskio_write_time: 写请求总时间单位毫秒counter类型需要用函数求rate才有使用价值
diskio_writes: 写请求次数counter类型需要用函数求rate才有使用价值
kernel_boot_time: 内核启动时间
kernel_context_switches: 内核上下文切换次数
kernel_entropy_avail: linux系统内部的熵池
kernel_interrupts: 内核中断次数
kernel_processes_forked: fork的进程数
kernel_boot_time: 内核启动时间
kernel_context_switches: 内核上下文切换次数
kernel_entropy_avail: linux系统内部的熵池
kernel_interrupts: 内核中断次数
kernel_processes_forked: fork的进程数
mem_active: 活跃使用的内存总数(包括cache和buffer内存)
mem_available: 应用程序可用内存数
mem_available_percent: 内存剩余百分比(0~100)
mem_buffered: 用来给文件做缓冲大小
mem_cached: 被高速缓冲存储器cache memory用的内存的大小等于 diskcache minus SwapCache
mem_commit_limit: 根据超额分配比率('vm.overcommit_ratio'这是当前在系统上分配可用的内存总量这个限制只是在模式2('vm.overcommit_memory')时启用
mem_committed_as: 目前在系统上分配的内存量。是所有进程申请的内存的总和
mem_dirty: 等待被写回到磁盘的内存大小
mem_free: 空闲内存数
mem_high_free: 未被使用的高位内存大小
mem_high_total: 高位内存总大小Highmem是指所有内存高于860MB的物理内存,Highmem区域供用户程序使用或用于页面缓存。该区域不是直接映射到内核空间。内核必须使用不同的手法使用该段内存
mem_huge_page_size: 每个大页的大小
mem_huge_pages_free: 池中尚未分配的 HugePages 数量
mem_huge_pages_total: 预留HugePages的总个数
mem_inactive: 空闲的内存数(包括free和avalible的内存)
mem_low_free: 未被使用的低位大小
mem_low_total: 低位内存总大小,低位可以达到高位内存一样的作用,而且它还能够被内核用来记录一些自己的数据结构
mem_mapped: 设备和文件等映射的大小
mem_page_tables: 管理内存分页页面的索引表的大小
mem_shared: 多个进程共享的内存总额
mem_slab: 内核数据结构缓存的大小,可以减少申请和释放内存带来的消耗
mem_sreclaimable: 可收回Slab的大小
mem_sunreclaim: 不可收回Slab的大小SUnreclaim+SReclaimableSlab
mem_swap_cached: 被高速缓冲存储器cache memory用的交换空间的大小已经被交换出来的内存但仍然被存放在swapfile中。用来在需要的时候很快的被替换而不需要再次打开I/O端口
mem_swap_free: 未被使用交换空间的大小
mem_swap_total: 交换空间的总大小
mem_total: 内存总数
mem_used: 已用内存数
mem_used_percent: 已用内存数百分比(0~100)
mem_vmalloc_chunk: 最大的连续未被使用的vmalloc区域
mem_vmalloc_totalL: 可以vmalloc虚拟内存大小
mem_vmalloc_used: vmalloc已使用的虚拟内存大小
mem_write_back: 正在被写回到磁盘的内存大小
mem_write_back_tmp: FUSE用于临时写回缓冲区的内存
mem_active: 活跃使用的内存总数(包括cache和buffer内存)
mem_available: 应用程序可用内存数
mem_available_percent: 内存剩余百分比(0~100)
mem_buffered: 用来给文件做缓冲大小
mem_cached: 被高速缓冲存储器cache memory用的内存的大小等于 diskcache minus SwapCache
mem_commit_limit: 根据超额分配比率('vm.overcommit_ratio'这是当前在系统上分配可用的内存总量这个限制只是在模式2('vm.overcommit_memory')时启用
mem_committed_as: 目前在系统上分配的内存量。是所有进程申请的内存的总和
mem_dirty: 等待被写回到磁盘的内存大小
mem_free: 空闲内存数
mem_high_free: 未被使用的高位内存大小
mem_high_total: 高位内存总大小Highmem是指所有内存高于860MB的物理内存,Highmem区域供用户程序使用或用于页面缓存。该区域不是直接映射到内核空间。内核必须使用不同的手法使用该段内存
mem_huge_page_size: 每个大页的大小
mem_huge_pages_free: 池中尚未分配的 HugePages 数量
mem_huge_pages_total: 预留HugePages的总个数
mem_inactive: 空闲的内存数(包括free和avalible的内存)
mem_low_free: 未被使用的低位大小
mem_low_total: 低位内存总大小,低位可以达到高位内存一样的作用,而且它还能够被内核用来记录一些自己的数据结构
mem_mapped: 设备和文件等映射的大小
mem_page_tables: 管理内存分页页面的索引表的大小
mem_shared: 多个进程共享的内存总额
mem_slab: 内核数据结构缓存的大小,可以减少申请和释放内存带来的消耗
mem_sreclaimable: 可收回Slab的大小
mem_sunreclaim: 不可收回Slab的大小SUnreclaim+SReclaimableSlab
mem_swap_cached: 被高速缓冲存储器cache memory用的交换空间的大小已经被交换出来的内存但仍然被存放在swapfile中。用来在需要的时候很快的被替换而不需要再次打开I/O端口
mem_swap_free: 未被使用交换空间的大小
mem_swap_total: 交换空间的总大小
mem_total: 内存总数
mem_used: 已用内存数
mem_used_percent: 已用内存数百分比(0~100)
mem_vmalloc_chunk: 最大的连续未被使用的vmalloc区域
mem_vmalloc_totalL: 可以vmalloc虚拟内存大小
mem_vmalloc_used: vmalloc已使用的虚拟内存大小
mem_write_back: 正在被写回到磁盘的内存大小
mem_write_back_tmp: FUSE用于临时写回缓冲区的内存
net_bytes_recv: 网卡收包总数(bytes)
net_bytes_sent: 网卡发包总数(bytes)
net_drop_in: 网卡收丢包数量
net_drop_out: 网卡发丢包数量
net_err_in: 网卡收包错误数量
net_err_out: 网卡发包错误数量
net_packets_recv: 网卡收包数量
net_packets_sent: 网卡发包数量
net_bytes_recv: 网卡收包总数(bytes)
net_bytes_sent: 网卡发包总数(bytes)
net_drop_in: 网卡收丢包数量
net_drop_out: 网卡发丢包数量
net_err_in: 网卡收包错误数量
net_err_out: 网卡发包错误数量
net_packets_recv: 网卡收包数量
net_packets_sent: 网卡发包数量
netstat_tcp_established: ESTABLISHED状态的网络链接数
netstat_tcp_fin_wait1: FIN_WAIT1状态的网络链接数
netstat_tcp_fin_wait2: FIN_WAIT2状态的网络链接数
netstat_tcp_last_ack: LAST_ACK状态的网络链接数
netstat_tcp_listen: LISTEN状态的网络链接数
netstat_tcp_syn_recv: SYN_RECV状态的网络链接数
netstat_tcp_syn_sent: SYN_SENT状态的网络链接数
netstat_tcp_time_wait: TIME_WAIT状态的网络链接数
netstat_udp_socket: UDP状态的网络链接数
netstat_tcp_established: ESTABLISHED状态的网络链接数
netstat_tcp_fin_wait1: FIN_WAIT1状态的网络链接数
netstat_tcp_fin_wait2: FIN_WAIT2状态的网络链接数
netstat_tcp_last_ack: LAST_ACK状态的网络链接数
netstat_tcp_listen: LISTEN状态的网络链接数
netstat_tcp_syn_recv: SYN_RECV状态的网络链接数
netstat_tcp_syn_sent: SYN_SENT状态的网络链接数
netstat_tcp_time_wait: TIME_WAIT状态的网络链接数
netstat_udp_socket: UDP状态的网络链接数
#[ping]
ping_percent_packet_loss: ping数据包丢失百分比(%)
ping_result_code: ping返回码('0','1')
processes_blocked: 不可中断的睡眠状态下的进程数('U','D','L')
processes_dead: 回收中的进程数('X')
processes_idle: 挂起的空闲进程数('I')
processes_paging: 分页进程数('P')
processes_running: 运行中的进程数('R')
processes_sleeping: 可中断进程数('S')
processes_stopped: 暂停状态进程数('T')
processes_total: 总进程数
processes_total_threads: 总线程数
processes_unknown: 未知状态进程数
processes_zombies: 僵尸态进程数('Z')
processes_blocked: 不可中断的睡眠状态下的进程数('U','D','L')
processes_dead: 回收中的进程数('X')
processes_idle: 挂起的空闲进程数('I')
processes_paging: 分页进程数('P')
processes_running: 运行中的进程数('R')
processes_sleeping: 可中断进程数('S')
processes_stopped: 暂停状态进程数('T')
processes_total: 总进程数
processes_total_threads: 总线程数
processes_unknown: 未知状态进程数
processes_zombies: 僵尸态进程数('Z')
swap_used_percent: Swap空间换出数据量
swap_used_percent: Swap空间换出数据量
system_load1: 1分钟平均load值
system_load5: 5分钟平均load值
system_load15: 15分钟平均load值
system_n_users: 用户数
system_n_cpus: CPU核数
system_uptime: 系统启动时间
system_load1: 1分钟平均load值
system_load5: 5分钟平均load值
system_load15: 15分钟平均load值
system_n_users: 用户
system_n_cpus: CPU核数
system_uptime: 系统启动时间
nginx_accepts: 自nginx启动起,与客户端建立过得连接总数
nginx_active: 当前nginx正在处理的活动连接数,等于Reading/Writing/Waiting总和
nginx_handled: 自nginx启动起,处理过的客户端连接总数
nginx_reading: 正在读取HTTP请求头部的连接总
nginx_requests: 自nginx启动起,处理过的客户端请求总数,由于存在HTTP Krrp-Alive请求,该值会大于handled值
nginx_upstream_check_fall: upstream_check模块检测到后端失败的次数
nginx_upstream_check_rise: upstream_check模块对后端的检测次数
nginx_upstream_check_status_code: 后端upstream的状态,up为1,down为0
nginx_waiting: 开启 keep-alive 的情况下,这个值等于 active (reading+writing), 意思就是 Nginx 已经处理完正在等候下一次请求指令的驻留连接
nginx_writing: 正在向客户端发送响应的连接总数
nginx_accepts: 自nginx启动起,与客户端建立过得连接总数
nginx_active: 当前nginx正在处理的活动连接数,等于Reading/Writing/Waiting总和
nginx_handled: 自nginx启动起,处理过的客户端连接总数
nginx_reading: 正在读取HTTP请求头部的连接总数
nginx_requests: 自nginx启动起,处理过的客户端请求总数,由于存在HTTP Krrp-Alive请求,该值会大于handled值
nginx_upstream_check_fall: upstream_check模块检测到后端失败的次数
nginx_upstream_check_rise: upstream_check模块对后端的检测次数
nginx_upstream_check_status_code: 后端upstream的状态,up为1,down为0
nginx_waiting: 开启 keep-alive 的情况下,这个值等于 active (reading+writing), 意思就是 Nginx 已经处理完正在等候下一次请求指令的驻留连接
nginx_writing: 正在向客户端发送响应的连接总数
http_response_content_length: HTTP消息实体的传输长度
http_response_http_response_code: http响应状态码
http_response_response_time: http响应用时
http_response_result_code: url探测结果0为正常否则url无法访问
# [aws cloudwatch rds]
cloudwatch_aws_rds_bin_log_disk_usage_average: rds 磁盘使用平均值
cloudwatch_aws_rds_bin_log_disk_usage_maximum: rds 磁盘使用量最大值
cloudwatch_aws_rds_bin_log_disk_usage_minimum: rds binlog 磁盘使用量最低
cloudwatch_aws_rds_bin_log_disk_usage_sample_count: rds binlog 磁盘使用情况样本计数
cloudwatch_aws_rds_bin_log_disk_usage_sum: rds binlog 磁盘使用总和
cloudwatch_aws_rds_burst_balance_average: rds 突发余额平均值
cloudwatch_aws_rds_burst_balance_maximum: rds 突发余额最大值
cloudwatch_aws_rds_burst_balance_minimum: rds 突发余额最低
cloudwatch_aws_rds_burst_balance_sample_count: rds 突发平衡样本计数
cloudwatch_aws_rds_burst_balance_sum: rds 突发余额总和
cloudwatch_aws_rds_cpu_utilization_average: rds cpu 利用率平均值
cloudwatch_aws_rds_cpu_utilization_maximum: rds cpu 利用率最大值
cloudwatch_aws_rds_cpu_utilization_minimum: rds cpu 利用率最低
cloudwatch_aws_rds_cpu_utilization_sample_count: rds cpu 利用率样本计数
cloudwatch_aws_rds_cpu_utilization_sum: rds cpu 利用率总和
cloudwatch_aws_rds_database_connections_average: rds 数据库连接平均值
cloudwatch_aws_rds_database_connections_maximum: rds 数据库连接数最大值
cloudwatch_aws_rds_database_connections_minimum: rds 数据库连接最小
cloudwatch_aws_rds_database_connections_sample_count: rds 数据库连接样本数
cloudwatch_aws_rds_database_connections_sum: rds 数据库连接总和
cloudwatch_aws_rds_db_load_average: rds db 平均负载
cloudwatch_aws_rds_db_load_cpu_average: rds db 负载 cpu 平均值
cloudwatch_aws_rds_db_load_cpu_maximum: rds db 负载 cpu 最大值
cloudwatch_aws_rds_db_load_cpu_minimum: rds db 负载 cpu 最小值
cloudwatch_aws_rds_db_load_cpu_sample_count: rds db 加载 CPU 样本数
cloudwatch_aws_rds_db_load_cpu_sum: rds db 加载cpu总和
cloudwatch_aws_rds_db_load_maximum: rds 数据库负载最大值
cloudwatch_aws_rds_db_load_minimum: rds 数据库负载最小值
cloudwatch_aws_rds_db_load_non_cpu_average: rds 加载非 CPU 平均值
cloudwatch_aws_rds_db_load_non_cpu_maximum: rds 加载非 cpu 最大值
cloudwatch_aws_rds_db_load_non_cpu_minimum: rds 加载非 cpu 最小值
cloudwatch_aws_rds_db_load_non_cpu_sample_count: rds 加载非 cpu 样本计数
cloudwatch_aws_rds_db_load_non_cpu_sum: rds 加载非cpu总和
cloudwatch_aws_rds_db_load_sample_count: rds db 加载样本计数
cloudwatch_aws_rds_db_load_sum: rds db 负载总和
cloudwatch_aws_rds_disk_queue_depth_average: rds 磁盘队列深度平均值
cloudwatch_aws_rds_disk_queue_depth_maximum: rds 磁盘队列深度最大值
cloudwatch_aws_rds_disk_queue_depth_minimum: rds 磁盘队列深度最小值
cloudwatch_aws_rds_disk_queue_depth_sample_count: rds 磁盘队列深度样本计数
cloudwatch_aws_rds_disk_queue_depth_sum: rds 磁盘队列深度总和
cloudwatch_aws_rds_ebs_byte_balance__average: rds ebs 字节余额平均值
cloudwatch_aws_rds_ebs_byte_balance__maximum: rds ebs 字节余额最大值
cloudwatch_aws_rds_ebs_byte_balance__minimum: rds ebs 字节余额最低
cloudwatch_aws_rds_ebs_byte_balance__sample_count: rds ebs 字节余额样本数
cloudwatch_aws_rds_ebs_byte_balance__sum: rds ebs 字节余额总和
cloudwatch_aws_rds_ebsio_balance__average: rds ebsio 余额平均值
cloudwatch_aws_rds_ebsio_balance__maximum: rds ebsio 余额最大值
cloudwatch_aws_rds_ebsio_balance__minimum: rds ebsio 余额最低
cloudwatch_aws_rds_ebsio_balance__sample_count: rds ebsio 平衡样本计数
cloudwatch_aws_rds_ebsio_balance__sum: rds ebsio 余额总和
cloudwatch_aws_rds_free_storage_space_average: rds 免费存储空间平均
cloudwatch_aws_rds_free_storage_space_maximum: rds 最大可用存储空间
cloudwatch_aws_rds_free_storage_space_minimum: rds 最低可用存储空间
cloudwatch_aws_rds_free_storage_space_sample_count: rds 可用存储空间样本数
cloudwatch_aws_rds_free_storage_space_sum: rds 免费存储空间总和
cloudwatch_aws_rds_freeable_memory_average: rds 可用内存平均值
cloudwatch_aws_rds_freeable_memory_maximum: rds 最大可用内存
cloudwatch_aws_rds_freeable_memory_minimum: rds 最小可用内存
cloudwatch_aws_rds_freeable_memory_sample_count: rds 可释放内存样本数
cloudwatch_aws_rds_freeable_memory_sum: rds 可释放内存总和
cloudwatch_aws_rds_lvm_read_iops_average: rds lvm 读取 iops 平均值
cloudwatch_aws_rds_lvm_read_iops_maximum: rds lvm 读取 iops 最大值
cloudwatch_aws_rds_lvm_read_iops_minimum: rds lvm 读取 iops 最低
cloudwatch_aws_rds_lvm_read_iops_sample_count: rds lvm 读取 iops 样本计数
cloudwatch_aws_rds_lvm_read_iops_sum: rds lvm 读取 iops 总和
cloudwatch_aws_rds_lvm_write_iops_average: rds lvm 写入 iops 平均值
cloudwatch_aws_rds_lvm_write_iops_maximum: rds lvm 写入 iops 最大值
cloudwatch_aws_rds_lvm_write_iops_minimum: rds lvm 写入 iops 最低
cloudwatch_aws_rds_lvm_write_iops_sample_count: rds lvm 写入 iops 样本计数
cloudwatch_aws_rds_lvm_write_iops_sum: rds lvm 写入 iops 总和
cloudwatch_aws_rds_network_receive_throughput_average: rds 网络接收吞吐量平均
cloudwatch_aws_rds_network_receive_throughput_maximum: rds 网络接收吞吐量最大值
cloudwatch_aws_rds_network_receive_throughput_minimum: rds 网络接收吞吐量最小值
cloudwatch_aws_rds_network_receive_throughput_sample_count: rds 网络接收吞吐量样本计数
cloudwatch_aws_rds_network_receive_throughput_sum: rds 网络接收吞吐量总和
cloudwatch_aws_rds_network_transmit_throughput_average: rds 网络传输吞吐量平均值
cloudwatch_aws_rds_network_transmit_throughput_maximum: rds 网络传输吞吐量最大
cloudwatch_aws_rds_network_transmit_throughput_minimum: rds 网络传输吞吐量最小值
cloudwatch_aws_rds_network_transmit_throughput_sample_count: rds 网络传输吞吐量样本计数
cloudwatch_aws_rds_network_transmit_throughput_sum: rds 网络传输吞吐量总和
cloudwatch_aws_rds_read_iops_average: rds 读取 iops 平均值
cloudwatch_aws_rds_read_iops_maximum: rds 最大读取 iops
cloudwatch_aws_rds_read_iops_minimum: rds 读取 iops 最低
cloudwatch_aws_rds_read_iops_sample_count: rds 读取 iops 样本计数
cloudwatch_aws_rds_read_iops_sum: rds 读取 iops 总和
cloudwatch_aws_rds_read_latency_average: rds 读取延迟平均值
cloudwatch_aws_rds_read_latency_maximum: rds 读取延迟最大值
cloudwatch_aws_rds_read_latency_minimum: rds 最小读取延迟
cloudwatch_aws_rds_read_latency_sample_count: rds 读取延迟样本计数
cloudwatch_aws_rds_read_latency_sum: rds 读取延迟总和
cloudwatch_aws_rds_read_throughput_average: rds 读取吞吐量平均值
cloudwatch_aws_rds_read_throughput_maximum: rds 最大读取吞吐量
cloudwatch_aws_rds_read_throughput_minimum: rds 最小读取吞吐量
cloudwatch_aws_rds_read_throughput_sample_count: rds 读取吞吐量样本计数
cloudwatch_aws_rds_read_throughput_sum: rds 读取吞吐量总和
cloudwatch_aws_rds_swap_usage_average: rds 交换使用平均值
cloudwatch_aws_rds_swap_usage_maximum: rds 交换使用最大值
cloudwatch_aws_rds_swap_usage_minimum: rds 交换使用量最低
cloudwatch_aws_rds_swap_usage_sample_count: rds 交换使用示例计数
cloudwatch_aws_rds_swap_usage_sum: rds 交换使用总和
cloudwatch_aws_rds_write_iops_average: rds 写入 iops 平均值
cloudwatch_aws_rds_write_iops_maximum: rds 写入 iops 最大值
cloudwatch_aws_rds_write_iops_minimum: rds 写入 iops 最低
cloudwatch_aws_rds_write_iops_sample_count: rds 写入 iops 样本计数
cloudwatch_aws_rds_write_iops_sum: rds 写入 iops 总和
cloudwatch_aws_rds_write_latency_average: rds 写入延迟平均值
cloudwatch_aws_rds_write_latency_maximum: rds 最大写入延迟
cloudwatch_aws_rds_write_latency_minimum: rds 写入延迟最小值
cloudwatch_aws_rds_write_latency_sample_count: rds 写入延迟样本计数
cloudwatch_aws_rds_write_latency_sum: rds 写入延迟总和
cloudwatch_aws_rds_write_throughput_average: rds 写入吞吐量平均值
cloudwatch_aws_rds_write_throughput_maximum: rds 最大写入吞吐量
cloudwatch_aws_rds_write_throughput_minimum: rds 写入吞吐量最小值
cloudwatch_aws_rds_write_throughput_sample_count: rds 写入吞吐量样本计数
cloudwatch_aws_rds_write_throughput_sum: rds 写入吞吐量总和
en:
cpu_usage_idle: "CPU idle rate(unit%)"
cpu_usage_active: "CPU usage rate(unit%)"
cpu_usage_system: "CPU kernel state time proportion(unit%)"
cpu_usage_user: "CPU user attitude time proportion(unit%)"
cpu_usage_nice: "The proportion of low priority CPU time, that is, the process NICE value is adjusted to the CPU time between 1-19. Note here that the value range of NICE is -20 to 19, the larger the value, the lower the priority, the lower the priority(unit%)"
cpu_usage_iowait: "CPU waiting for I/O time proportion(unit%)"
cpu_usage_irq: "CPU processing hard interrupt time proportion(unit%)"
cpu_usage_softirq: "CPU processing soft interrupt time proportion(unit%)"
cpu_usage_steal: "In the virtual machine environment, there is this indicator, which means that the CPU is used by other virtual machines for the proportion of time.(unit%)"
cpu_usage_guest: "The time to run other operating systems by virtualization, that is, the proportion of CPU time running the virtual machine(unit%)"
cpu_usage_guest_nice: "The proportion of time to run the virtual machine at low priority(unit%)"
disk_free: "The remaining amount of the hard disk partition (unit: byte)"
disk_used: "Hard disk partitional use (unit: byte)"
disk_used_percent: "Hard disk partitional use rate (unit:%)"
disk_total: "Total amount of hard disk partition (unit: byte)"
disk_inodes_free: "Hard disk partition INODE remaining amount"
disk_inodes_used: "Hard disk partition INODE usage amount"
disk_inodes_total: "The total amount of hard disk partition INODE"
diskio_io_time: "From the perspective of the device perspective, the total time of I/O request, the I/O request in the queue is count (unit: millisecond), the counter type, you need to use the function to find the value"
diskio_iops_in_progress: "IO requests that have been assigned to device -driven and have not yet been completed, not included in the queue but not yet assigned to the device -driven IO request, Gauge type"
diskio_merged_reads: "The number of times of adjacent reading request Merge, the counter type"
diskio_merged_writes: "The number of times the request Merge writes, the counter type"
diskio_read_bytes: "The number of byte reads, the counter type, you need to use the function to find the Rate to use the value"
diskio_read_time: "The total time of reading request (unit: millisecond), the counter type, you need to use the function to find the Rate to have the value of use"
diskio_reads: "Read the number of requests, the counter type, you need to use the function to find the Rate to use the value"
diskio_weighted_io_time: "From the perspective of the I/O request perspective, I/O wait for the total time. If there are multiple I/O requests at the same time, the time will be superimposed (unit: millisecond)"
diskio_write_bytes: "The number of bytes written, the counter type, you need to use the function to find the Rate to use the value"
diskio_write_time: "The total time of the request (unit: millisecond), the counter type, you need to use the function to find the rate to have the value of use"
diskio_writes: "Write the number of requests, the counter type, you need to use the function to find the rate to use value"
kernel_boot_time: "Kernel startup time"
kernel_context_switches: "Number of kernel context switching times"
kernel_entropy_avail: "Entropy pool inside the Linux system"
kernel_interrupts: "Number of kernel interruption"
kernel_processes_forked: "ForK's process number"
mem_active: "The total number of memory (including Cache and BUFFER memory)"
mem_available: "Application can use memory numbers"
mem_available_percent: "Memory remaining percentage (0 ~ 100)"
mem_buffered: "Used to make buffer size for the file"
mem_cached: "The size of the memory used by the cache memory (equal to diskcache minus Swap Cache )"
mem_commit_limit: "According to the over allocation ratio ('vm.overCommit _ Ratio'), this is the current total memory that can be allocated on the system."
mem_committed_as: "Currently allocated on the system. It is the sum of the memory of all process applications"
mem_dirty: "Waiting to be written back to the memory size of the disk"
mem_free: "Senior memory number"
mem_high_free: "Unused high memory size"
mem_high_total: "The total memory size of the high memory (Highmem refers to all the physical memory that is higher than 860 MB of memory, the HighMem area is used for user programs, or for page cache. This area is not directly mapped to the kernel space. The kernels must use different methods to use this section of memory. )"
mem_huge_page_size: "The size of each big page"
mem_huge_pages_free: "The number of Huge Pages in the pool that have not been allocated"
mem_huge_pages_total: "Reserve the total number of Huge Pages"
mem_inactive: "Free memory (including the memory of free and avalible)"
mem_low_free: "Unused low size"
mem_low_total: "The total size of the low memory memory can achieve the same role of high memory, and it can be used by the kernel to record some of its own data structure"
mem_mapped: "The size of the mapping of equipment and files"
mem_page_tables: "The size of the index table of the management of the memory paging page"
mem_shared: "The total memory shared by multiple processes"
mem_slab: "The size of the kernel data structure cache can reduce the consumption of application and release memory"
mem_sreclaimable: "The size of the SLAB can be recovered"
mem_sunreclaim: "The size of the SLAB cannot be recovered(SUnreclaim+SReclaimableSlab)"
mem_swap_cached: "The size of the swap space used by the cache memory (cache memory), the memory that has been swapped out, but is still stored in the swapfile. Used to be quickly replaced when needed without opening the I/O port again"
mem_swap_free: "The size of the switching space is not used"
mem_swap_total: "The total size of the exchange space"
mem_total: "Total memory"
mem_used: "Memory number"
mem_used_percent: "The memory has been used by several percentage (0 ~ 100)"
mem_vmalloc_chunk: "The largest continuous unused vmalloc area"
mem_vmalloc_totalL: "You can vmalloc virtual memory size"
mem_vmalloc_used: "Vmalloc's virtual memory size"
mem_write_back: "The memory size of the disk is being written back to the disk"
mem_write_back_tmp: "Fuse is used to temporarily write back the memory of the buffer area"
net_bytes_recv: "The total number of packaging of the network card (bytes)"
net_bytes_sent: "Total number of network cards (bytes)"
net_drop_in: "The number of packets for network cards"
net_drop_out: "The number of packets issued by the network card"
net_err_in: "The number of incorrect packets of the network card"
net_err_out: "Number of incorrect number of network cards"
net_packets_recv: "Net card collection quantity"
net_packets_sent: "Number of network card issuance"
netstat_tcp_established: "ESTABLISHED status network link number"
netstat_tcp_fin_wait1: "FIN _ WAIT1 status network link number"
netstat_tcp_fin_wait2: "FIN _ WAIT2 status number of network links"
netstat_tcp_last_ack: "LAST_ ACK status number of network links"
netstat_tcp_listen: "Number of network links in Listen status"
netstat_tcp_syn_recv: "SYN _ RECV status number of network links"
netstat_tcp_syn_sent: "SYN _ SENT status number of network links"
netstat_tcp_time_wait: "Time _ WAIT status network link number"
netstat_udp_socket: "Number of network links in UDP status"
processes_blocked: "The number of processes in the unreprudible sleep state('U','D','L')"
processes_dead: "Number of processes in recycling('X')"
processes_idle: "Number of idle processes hanging('I')"
processes_paging: "Number of paging processes('P')"
processes_running: "Number of processes during operation('R')"
processes_sleeping: "Can interrupt the number of processes('S')"
processes_stopped: "Pushing status process number('T')"
processes_total: "Total process number"
processes_total_threads: "Number of threads"
processes_unknown: "Unknown status process number"
processes_zombies: "Number of zombies('Z')"
swap_used_percent: "SWAP space replace the data volume"
system_load1: "1 minute average load value"
system_load5: "5 minutes average load value"
system_load15: "15 minutes average load value"
system_n_users: "User number"
system_n_cpus: "CPU nuclear number"
system_uptime: "System startup time"
nginx_accepts: "Since Nginx started, the total number of connections has been established with the client"
nginx_active: "The current number of activity connections that Nginx is being processed is equal to Reading/Writing/Waiting"
nginx_handled: "Starting from Nginx, the total number of client connections that have been processed"
nginx_reading: "Reading the total number of connections on the http request header"
nginx_requests: "Since nginx is started, the total number of client requests processed, due to the existence of HTTP Krrp - Alive requests, this value will be greater than the handled value"
nginx_upstream_check_fall: "UPStream_CHECK module detects the number of back -end failures"
nginx_upstream_check_rise: "UPSTREAM _ Check module to detect the number of back -end"
nginx_upstream_check_status_code: "The state of the backstream is 1, and the down is 0"
nginx_waiting: "When keep-alive is enabled, this value is equal to active (reading+writing), which means that Nginx has processed the resident connection that is waiting for the next request command"
nginx_writing: "The total number of connections to send a response to the client"
http_response_content_length: "HTTP message entity transmission length"
http_response_http_response_code: "http response status code"
http_response_response_time: "When http ring application"
http_response_result_code: "URL detection result 0 is normal, otherwise the URL cannot be accessed"
http_response_content_length: HTTP消息实体的传输长度
http_response_http_response_code: http响应状态码
http_response_response_time: http响应用时
http_response_result_code: url探测结果0为正常否则url无法访问
# [mysqld_exporter]
mysql_global_status_uptime: The number of seconds that the server has been up.(Gauge)
@@ -489,7 +237,7 @@ redis_last_key_groups_scrape_duration_milliseconds: Duration of the last key gro
redis_last_slow_execution_duration_seconds: The amount of time needed for last slow execution, in seconds.
redis_latest_fork_seconds: The amount of time needed for last fork, in seconds.
redis_lazyfree_pending_objects: The number of objects waiting to be freed (as a result of calling UNLINK, or FLUSHDB and FLUSHALL with the ASYNC option).
redis_master_repl_offset: The server's current replication offset.
redis_master_repl_offset: The server's current replication offset.
redis_mem_clients_normal: Memory used by normal clients.(Gauge)
redis_mem_clients_slaves: Memory used by replica clients - Starting Redis 7.0, replica buffers share memory with the replication backlog, so this field can show 0 when replicas don't trigger an increase of memory usage.
redis_mem_fragmentation_bytes: Delta between used_memory_rss and used_memory. Note that when the total fragmentation bytes is low (few megabytes), a high ratio (e.g. 1.5 and above) is not an indication of an issue.
@@ -622,6 +370,8 @@ node_load15: cpu load 15m
# MEM
# 内核态
# 用户追踪已从交换区获取但尚未修改的页面的内存
node_memory_SwapCached_bytes: Memory that keeps track of pages that have been fetched from swap but not yet been modified
# 内核用于缓存数据结构供自己使用的内存
node_memory_Slab_bytes: Memory used by the kernel to cache data structures for its own use
# slab中可回收的部分
@@ -683,7 +433,7 @@ node_memory_SwapTotal_bytes: Memory information field SwapTotal_bytes
node_memory_SwapFree_bytes: Memory information field SwapFree_bytes
# DISK
node_filesystem_avail_bytes: Filesystem space available to non-root users in byte
node_filesystem_files_free: Filesystem space available to non-root users in byte
node_filesystem_free_bytes: Filesystem free space in bytes
node_filesystem_size_bytes: Filesystem size in bytes
node_filesystem_files_free: Filesystem total free file nodes
@@ -729,7 +479,7 @@ kafka_consumer_lag_millis: Current approximation of consumer lag for a ConsumerG
kafka_topic_partition_under_replicated_partition: 1 if Topic/Partition is under Replicated
# [zookeeper_exporter]
zk_znode_count: The total count of znodes stored
zk_znode_count: The total count of znodes stored
zk_ephemerals_count: The number of Ephemerals nodes
zk_watch_count: The number of watchers setup over Zookeeper nodes.
zk_approximate_data_size: Size of data in bytes that a zookeeper server has in its data tree
@@ -741,4 +491,4 @@ zk_open_file_descriptor_count: Number of file descriptors that a zookeeper serve
zk_max_file_descriptor_count: Maximum number of file descriptors that a zookeeper server can open
zk_avg_latency: Average time in milliseconds for requests to be processed
zk_min_latency: Maximum time in milliseconds for a request to be processed
zk_max_latency: Minimum time in milliseconds for a request to be processed
zk_max_latency: Minimum time in milliseconds for a request to be processed

View File

@@ -24,11 +24,6 @@ class Sender(object):
# already done in go code
pass
@classmethod
def send_mm(cls, payload):
# already done in go code
pass
@classmethod
def send_sms(cls, payload):
users = payload.get('event').get("notify_users_obj")

View File

@@ -7,6 +7,13 @@ import (
"github.com/tidwall/gjson"
)
// the caller can be called for alerting notify by complete this interface
type inter interface {
Descript() string
Notify([]byte)
NotifyMaintainer([]byte)
}
// N9E complete
type N9EPlugin struct {
Name string
@@ -23,7 +30,6 @@ func (n *N9EPlugin) Notify(bs []byte) {
"dingtalk_robot_token",
"wecom_robot_token",
"feishu_robot_token",
"telegram_robot_token",
}
for _, ch := range channels {
if ret := gjson.GetBytes(bs, ch); ret.Exists() {
@@ -42,6 +48,6 @@ func (n *N9EPlugin) NotifyMaintainer(bs []byte) {
// will be loaded for alertingCall , The first letter must be capitalized to be exported
var N9eCaller = N9EPlugin{
Name: "N9EPlugin",
Description: "Notify by lib",
Description: "Notification by lib",
BuildAt: time.Now().Local().Format("2006/01/02 15:04:05"),
}

View File

@@ -1,193 +0,0 @@
import json
import yaml
'''
将promtheus/vmalert的rule转换为n9e中的rule
支持k8s的rule configmap
'''
rule_file = 'rules.yaml'
def convert_interval(interval):
if interval.endswith('s') or interval.endswith('S'):
return int(interval[:-1])
if interval.endswith('m') or interval.endswith('M'):
return int(interval[:-1]) * 60
if interval.endswith('h') or interval.endswith('H'):
return int(interval[:-1]) * 60 * 60
if interval.endswith('d') or interval.endswith('D'):
return int(interval[:-1]) * 60 * 60 * 24
return int(interval)
def convert_alert(rule, interval):
name = rule['alert']
prom_ql = rule['expr']
if 'for' in rule:
prom_for_duration = convert_interval(rule['for'])
else:
prom_for_duration = 0
prom_eval_interval = convert_interval(interval)
note = ''
if 'annotations' in rule:
for v in rule['annotations'].values():
note = v
break
append_tags = []
severity = 2
if 'labels' in rule:
for k, v in rule['labels'].items():
if k != 'severity':
append_tags.append('{}={}'.format(k, v))
continue
if v == 'critical':
severity = 1
elif v == 'info':
severity = 3
# elif v == 'warning':
# severity = 2
n9e_alert_rule = {
"name": name,
"note": note,
"severity": severity,
"disabled": 0,
"prom_for_duration": prom_for_duration,
"prom_ql": prom_ql,
"prom_eval_interval": prom_eval_interval,
"enable_stime": "00:00",
"enable_etime": "23:59",
"enable_days_of_week": [
"1",
"2",
"3",
"4",
"5",
"6",
"0"
],
"enable_in_bg": 0,
"notify_recovered": 1,
"notify_channels": [],
"notify_repeat_step": 60,
"recover_duration": 0,
"callbacks": [],
"runbook_url": "",
"append_tags": append_tags
}
return n9e_alert_rule
def convert_record(rule, interval):
name = rule['record']
prom_ql = rule['expr']
prom_eval_interval = convert_interval(interval)
note = ''
append_tags = []
if 'labels' in rule:
for k, v in rule['labels'].items():
append_tags.append('{}={}'.format(k, v))
n9e_record_rule = {
"name": name,
"note": note,
"disabled": 0,
"prom_ql": prom_ql,
"prom_eval_interval": prom_eval_interval,
"append_tags": append_tags
}
return n9e_record_rule
'''
example of rule group file
---
groups:
- name: example
rules:
- alert: HighRequestLatency
expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
for: 10m
labels:
severity: page
annotations:
summary: High request latency
'''
def deal_group(group):
"""
parse single prometheus/vmalert rule group
"""
alert_rules = []
record_rules = []
for rule_segment in group['groups']:
if 'interval' in rule_segment:
interval = rule_segment['interval']
else:
interval = '15s'
for rule in rule_segment['rules']:
if 'alert' in rule:
alert_rules.append(convert_alert(rule, interval))
else:
record_rules.append(convert_record(rule, interval))
return alert_rules, record_rules
'''
example of k8s rule configmap
---
apiVersion: v1
kind: ConfigMap
metadata:
name: rulefiles-0
data:
etcdrules.yaml: |
groups:
- name: etcd
rules:
- alert: etcdInsufficientMembers
annotations:
message: 'etcd cluster "{{ $labels.job }}": insufficient members ({{ $value}}).'
expr: sum(up{job=~".*etcd.*"} == bool 1) by (job) < ((count(up{job=~".*etcd.*"})
by (job) + 1) / 2)
for: 3m
labels:
severity: critical
'''
def deal_configmap(rule_configmap):
"""
parse rule configmap from k8s
"""
all_record_rules = []
all_alert_rules = []
for _, rule_group_str in rule_configmap['data'].items():
rule_group = yaml.load(rule_group_str, Loader=yaml.FullLoader)
alert_rules, record_rules = deal_group(rule_group)
all_alert_rules.extend(alert_rules)
all_record_rules.extend(record_rules)
return all_alert_rules, all_record_rules
def main():
with open(rule_file, 'r') as f:
rule_config = yaml.load(f, Loader=yaml.FullLoader)
# 如果文件是k8s中的configmap,使用下面的方法
# alert_rules, record_rules = deal_configmap(rule_config)
alert_rules, record_rules = deal_group(rule_config)
with open("alert-rules.json", 'w') as fw:
json.dump(alert_rules, fw, indent=2, ensure_ascii=False)
with open("record-rules.json", 'w') as fw:
json.dump(record_rules, fw, indent=2, ensure_ascii=False)
if __name__ == '__main__':
main()

View File

@@ -9,15 +9,9 @@ ClusterName = "Default"
BusiGroupLabelKey = "busigroup"
# sleep x seconds, then start judge engine
EngineDelay = 30
EngineDelay = 120
DisableUsageReport = true
# config | database
ReaderFrom = "config"
# if true, target tags can rewrite labels defined in categraf config file
LabelRewrite = false
DisableUsageReport = false
[Log]
# log write dir
@@ -76,12 +70,10 @@ InsecureSkipVerify = true
Batch = 5
[Alerting]
# timeout settings, unit: ms, default: 30000ms
Timeout=30000
TemplatesDir = "./etc/template"
NotifyConcurrency = 10
# use builtin go code notify
NotifyBuiltinChannels = ["email", "dingtalk", "wecom", "feishu", "mm", "telegram"]
NotifyBuiltinChannels = ["email", "dingtalk", "wecom", "feishu"]
[Alerting.CallScript]
# built in sending capability in go code
@@ -112,7 +104,7 @@ Headers = ["Content-Type", "application/json", "X-From", "N9E"]
[NoData]
Metric = "target_up"
# unit: second
Interval = 120
Interval = 15
[Ibex]
# callback: ${ibex}/${tplid}/${host}
@@ -135,8 +127,6 @@ Address = "127.0.0.1:6379"
RedisType = "standalone"
# Mastername for sentinel type
# MasterName = "mymaster"
# SentinelUsername = ""
# SentinelPassword = ""
[DB]
# postgres: host=%s port=%s user=%s dbname=%s password=%s sslmode=%s
@@ -165,31 +155,23 @@ BasicAuthUser = ""
BasicAuthPass = ""
# timeout settings, unit: ms
Timeout = 30000
DialTimeout = 3000
MaxIdleConnsPerHost = 100
# [[Readers]]
# ClusterName = "Default"
# prometheus base url
# Url = "http://127.0.0.1:9090"
# Basic auth username
# BasicAuthUser = ""
# Basic auth password
# BasicAuthPass = ""
# timeout settings, unit: ms
# Timeout = 30000
# DialTimeout = 3000
# MaxIdleConnsPerHost = 100
DialTimeout = 10000
TLSHandshakeTimeout = 30000
ExpectContinueTimeout = 1000
IdleConnTimeout = 90000
# time duration, unit: ms
KeepAlive = 30000
MaxConnsPerHost = 0
MaxIdleConns = 100
MaxIdleConnsPerHost = 10
[WriterOpt]
# queue channel count
QueueCount = 1000
QueueCount = 100
# queue max size
QueueMaxSize = 1000000
QueueMaxSize = 200000
# once pop samples number from queue
QueuePopSize = 1000
# metric or ident
ShardingKey = "ident"
QueuePopSize = 2000
[[Writers]]
Url = "http://127.0.0.1:9090/api/v1/write"
@@ -198,7 +180,6 @@ BasicAuthUser = ""
# Basic auth password
BasicAuthPass = ""
# timeout settings, unit: ms
Headers = ["X-From", "n9e"]
Timeout = 10000
DialTimeout = 3000
TLSHandshakeTimeout = 30000
@@ -209,12 +190,6 @@ KeepAlive = 30000
MaxConnsPerHost = 0
MaxIdleConns = 100
MaxIdleConnsPerHost = 100
# [[Writers.WriteRelabels]]
# Action = "replace"
# SourceLabels = ["__address__"]
# Regex = "([^:]+)(?::\\d+)?"
# Replacement = "$1:80"
# TargetLabel = "__address__"
# [[Writers]]
# Url = "http://127.0.0.1:7201/api/v1/prom/remote/write"

View File

@@ -1,26 +0,0 @@
# 告警消息模版文件
模版中可以使用的变量参考`AlertCurEvent`对象
模版语法如何使用可以参考[html/template](https://pkg.go.dev/html/template)
## 如何在告警模版中添加监控详情url
假设web的地址是http://127.0.0.1:18000/, 实际使用时用web地址替换该地址
在监控模版中添加以下行:
* dingtalk / wecom / feishu
```markdown
[监控详情](http://127.0.0.1:18000/metric/explorer?promql={{ .PromQl | escape }})
```
* mailbody
```html
<tr>
<th>监控详情:</th>
<td>
<a href="http://127.0.0.1:18000/metric/explorer?promql={{ .PromQl | escape }}" target="_blank">点击查看</a>
</td>
</tr>
```

View File

@@ -1,7 +0,0 @@
级别状态: S{{.Severity}} {{if .IsRecovered}}Recovered{{else}}Triggered{{end}}
规则名称: {{.RuleName}}{{if .RuleNote}}
规则备注: {{.RuleNote}}{{end}}
监控指标: {{.TagsJSON}}
{{if .IsRecovered}}恢复时间:{{timeformat .LastEvalTime}}{{else}}触发时间: {{timeformat .TriggerTime}}
触发时值: {{.TriggerValue}}{{end}}
发送时间: {{timestamp}}

View File

@@ -1,9 +0,0 @@
**级别状态**: {{if .IsRecovered}}<font color="info">S{{.Severity}} Recovered</font>{{else}}<font color="warning">S{{.Severity}} Triggered</font>{{end}}
**规则标题**: {{.RuleName}}{{if .RuleNote}}
**规则备注**: {{.RuleNote}}{{end}}{{if .TargetIdent}}
**监控对象**: {{.TargetIdent}}{{end}}
**监控指标**: {{.TagsJSON}}{{if not .IsRecovered}}
**触发时值**: {{.TriggerValue}}{{end}}
{{if .IsRecovered}}**恢复时间**: {{timeformat .LastEvalTime}}{{else}}**首次触发时间**: {{timeformat .FirstTriggerTime}}{{end}}
{{$time_duration := sub now.Unix .FirstTriggerTime }}{{if .IsRecovered}}{{$time_duration = sub .LastEvalTime .FirstTriggerTime }}{{end}}**持续时长**: {{humanizeDurationInterface $time_duration}}
**发送时间**: {{timestamp}}

View File

@@ -1,9 +1,7 @@
**级别状态**: {{if .IsRecovered}}<font color="info">S{{.Severity}} Recovered</font>{{else}}<font color="warning">S{{.Severity}} Triggered</font>{{end}}
**规则标题**: {{.RuleName}}{{if .RuleNote}}
**规则备注**: {{.RuleNote}}{{end}}{{if .TargetIdent}}
**监控对象**: {{.TargetIdent}}{{end}}
**监控指标**: {{.TagsJSON}}{{if not .IsRecovered}}
**规则备注**: {{.RuleNote}}{{end}}
**监控指标**: {{.TagsJSON}}
{{if .IsRecovered}}**恢复时间**{{timeformat .LastEvalTime}}{{else}}**触发时间**: {{timeformat .TriggerTime}}
**触发时值**: {{.TriggerValue}}{{end}}
{{if .IsRecovered}}**恢复时间**: {{timeformat .LastEvalTime}}{{else}}**首次触发时间**: {{timeformat .FirstTriggerTime}}{{end}}
{{$time_duration := sub now.Unix .FirstTriggerTime }}{{if .IsRecovered}}{{$time_duration = sub .LastEvalTime .FirstTriggerTime }}{{end}}**持续时长**: {{humanizeDurationInterface $time_duration}}
**发送时间**: {{timestamp}}

View File

@@ -4,9 +4,6 @@ RunMode = "release"
# # custom i18n dict config
# I18N = "./etc/i18n.json"
# # custom i18n request header key
# I18NHeaderKey = "X-Language"
# metrics descriptions
MetricsYamlFile = "./etc/metrics.yaml"
@@ -39,16 +36,6 @@ Label = "飞书机器人"
# do not change Key
Key = "feishu"
[[NotifyChannels]]
Label = "mm bot"
# do not change Key
Key = "mm"
[[NotifyChannels]]
Label = "telegram机器人"
# do not change Key
Key = "telegram"
[[ContactKeys]]
Label = "Wecom Robot Token"
# do not change Key
@@ -64,16 +51,6 @@ Label = "Feishu Robot Token"
# do not change Key
Key = "feishu_robot_token"
[[ContactKeys]]
Label = "MatterMost Webhook URL"
# do not change Key
Key = "mm_webhook_url"
[[ContactKeys]]
Label = "Telegram Robot Token"
# do not change Key
Key = "telegram_robot_token"
[Log]
# log write dir
Dir = "logs"
@@ -159,7 +136,6 @@ Email = "mail"
[OIDC]
Enable = false
DisplayName = "OIDC登录"
RedirectURL = "http://n9e.com/callback"
SsoAddr = "http://sso.example.org"
ClientId = ""
@@ -172,54 +148,6 @@ Nickname = "nickname"
Phone = "phone_number"
Email = "email"
[CAS]
Enable = false
DisplayName = "CAS登录"
SsoAddr = "https://cas.example.com/cas/"
RedirectURL = "http://127.0.0.1:18000/callback/cas"
CoverAttributes = false
# cas user default roles
DefaultRoles = ["Standard"]
[CAS.Attributes]
Nickname = "nickname"
Phone = "phone_number"
Email = "email"
[OAuth]
Enable = false
DisplayName = "OAuth2登录"
RedirectURL = "http://127.0.0.1:18000/callback/oauth"
SsoAddr = "https://sso.example.com/oauth2/authorize"
TokenAddr = "https://sso.example.com/oauth2/token"
UserInfoAddr = "https://api.example.com/api/v1/user/info"
# "header" "querystring" "formdata"
TranTokenMethod = "header"
ClientId = ""
ClientSecret = ""
CoverAttributes = true
DefaultRoles = ["Standard"]
UserinfoIsArray = false
UserinfoPrefix = "data"
Scopes = ["profile", "email", "phone"]
[OAuth.Attributes]
# Username must be defined
Username = "username"
Nickname = "nickname"
Phone = "phone_number"
Email = "email"
# example
# # nested : UserinfoIsArray=false, UserinfoPrefix="data"
# # {"data":{"username":"123456","nickname":"姓名"},"code":0,"message":"ok"}
# # nested and array : UserinfoIsArray=true, UserinfoPrefix="data"
# # {"data":[{"username":"123456","nickname":"姓名"}],"code":0,"message":"ok"}
# # flat : UserinfoIsArray=false, UserinfoPrefix=""
# # {"username":"123456","nickname":"姓名"}
# # flat and array : UserinfoIsArray=true, UserinfoPrefix=""
# # [{"username":"123456","nickname":"姓名"}]
[Redis]
# address, ip:port or ip1:port,ip2:port for cluster and sentinel(SentinelAddrs)
Address = "127.0.0.1:6379"
@@ -232,8 +160,6 @@ Address = "127.0.0.1:6379"
RedisType = "standalone"
# Mastername for sentinel type
# MasterName = "mymaster"
# SentinelUsername = ""
# SentinelPassword = ""
[DB]
DSN="root:1234@tcp(127.0.0.1:3306)/n9e_v5?charset=utf8mb4&parseTime=True&loc=Local&allowNativePasswords=true"
@@ -265,7 +191,6 @@ BasicAuthPass = ""
Timeout = 30000
DialTimeout = 3000
MaxIdleConnsPerHost = 100
Headers = ["X-From", "n9e"]
[Ibex]
Address = "http://127.0.0.1:10090"
@@ -273,10 +198,4 @@ Address = "http://127.0.0.1:10090"
BasicAuthUser = "ibex"
BasicAuthPass = "ibex"
# unit: ms
Timeout = 3000
[TargetMetrics]
TargetUp = '''max(max_over_time(target_up{ident=~"(%s)"}[%dm])) by (ident)'''
LoadPerCore = '''max(max_over_time(system_load_norm_1{ident=~"(%s)"}[%dm])) by (ident)'''
MemUtil = '''100-max(max_over_time(mem_available_percent{ident=~"(%s)"}[%dm])) by (ident)'''
DiskUtil = '''max(max_over_time(disk_used_percent{ident=~"(%s)", path="/"}[%dm])) by (ident)'''
Timeout = 3000

16
go.mod
View File

@@ -6,9 +6,9 @@ require (
github.com/coreos/go-oidc v2.2.1+incompatible
github.com/dgrijalva/jwt-go v3.2.0+incompatible
github.com/gin-contrib/pprof v1.3.0
github.com/gin-gonic/gin v1.7.7
github.com/gin-gonic/gin v1.7.4
github.com/go-ldap/ldap/v3 v3.4.1
github.com/go-redis/redis/v9 v9.0.0-rc.1
github.com/go-redis/redis/v8 v8.11.3
github.com/gogo/protobuf v1.3.2
github.com/golang-jwt/jwt v3.2.2+incompatible
github.com/golang/protobuf v1.5.2
@@ -16,7 +16,6 @@ require (
github.com/google/uuid v1.3.0
github.com/json-iterator/go v1.1.12
github.com/koding/multiconfig v0.0.0-20171124222453-69c27309b2d7
github.com/mailru/easyjson v0.7.7
github.com/mattn/go-isatty v0.0.12
github.com/orcaman/concurrent-map v0.0.0-20210501183033-44dafcb38ecc
github.com/pkg/errors v0.9.1
@@ -24,7 +23,7 @@ require (
github.com/prometheus/common v0.32.1
github.com/prometheus/prometheus v2.5.0+incompatible
github.com/tidwall/gjson v1.14.0
github.com/toolkits/pkg v1.3.3
github.com/toolkits/pkg v1.2.9
github.com/urfave/cli/v2 v2.3.0
golang.org/x/oauth2 v0.0.0-20210514164344-f6687ab2804c
gopkg.in/gomail.v2 v2.0.0-20160411212932-81ebce5c23df
@@ -59,7 +58,6 @@ require (
github.com/jackc/pgx/v4 v4.13.0 // indirect
github.com/jinzhu/inflection v1.0.0 // indirect
github.com/jinzhu/now v1.1.2 // indirect
github.com/josharian/intern v1.0.0 // indirect
github.com/leodido/go-urn v1.2.0 // indirect
github.com/matttproud/golang_protobuf_extensions v1.0.1 // indirect
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
@@ -74,10 +72,10 @@ require (
github.com/tidwall/pretty v1.2.0 // indirect
github.com/ugorji/go/codec v1.1.7 // indirect
go.uber.org/automaxprocs v1.4.0 // indirect
golang.org/x/crypto v0.0.0-20210921155107-089bfa567519 // indirect
golang.org/x/net v0.0.0-20220722155237-a158d28d115b // indirect
golang.org/x/sys v0.0.0-20220722155257-8c9f86f7a55f // indirect
golang.org/x/text v0.3.8 // indirect
golang.org/x/crypto v0.0.0-20210817164053-32db794688a5 // indirect
golang.org/x/net v0.0.0-20210805182204-aaa1db679c0d // indirect
golang.org/x/sys v0.0.0-20220114195835-da31bd327af9 // indirect
golang.org/x/text v0.3.7 // indirect
google.golang.org/appengine v1.6.6 // indirect
google.golang.org/genproto v0.0.0-20211007155348-82e027067bd4 // indirect
google.golang.org/grpc v1.41.0 // indirect

70
go.sum
View File

@@ -89,7 +89,9 @@ github.com/fatih/camelcase v1.0.0 h1:hxNvNX/xYBp0ovncs8WyWZrOrpBNub/JfaMvbURyft8
github.com/fatih/camelcase v1.0.0/go.mod h1:yN2Sb0lFhZJUdVvtELVWefmrXpuZESvPmqwoZc+/fpc=
github.com/fatih/structs v1.1.0 h1:Q7juDM0QtcnhCpeyLGQKyg4TOIghuNXrkL32pHAUMxo=
github.com/fatih/structs v1.1.0/go.mod h1:9NiDSp5zOcgEDl+j00MP/WkGVPOlPRLejGD8Ga6PJ7M=
github.com/fsnotify/fsnotify v1.4.7/go.mod h1:jwhsz4b93w/PPRr/qN1Yymfu8t87LnFCMoQvtojpjFo=
github.com/fsnotify/fsnotify v1.4.9 h1:hsms1Qyu0jgnwNXIxa+/V/PDsU6CfLf6CNO8H7IWoS4=
github.com/fsnotify/fsnotify v1.4.9/go.mod h1:znqG4EE+3YCdAaPaxE2ZRY/06pZUdp0tY4IgpuI1SZQ=
github.com/garyburd/redigo v1.6.2/go.mod h1:NR3MbYisc3/PwhQ00EMzDiPmrwpPxAn5GI05/YaO1SY=
github.com/ghodss/yaml v1.0.0/go.mod h1:4dBDuWmgqj2HViK6kFavaiC9ZROes6MMH2rRYeMEF04=
github.com/gin-contrib/pprof v1.3.0 h1:G9eK6HnbkSqDZBYbzG4wrjCsA4e+cvYAHUZw6W+W9K0=
@@ -97,8 +99,8 @@ github.com/gin-contrib/pprof v1.3.0/go.mod h1:waMjT1H9b179t3CxuG1cV3DHpga6ybizwf
github.com/gin-contrib/sse v0.1.0 h1:Y/yl/+YNO8GZSjAhjMsSuLt29uWRFHdHYUb5lYOV9qE=
github.com/gin-contrib/sse v0.1.0/go.mod h1:RHrZQHXnP2xjPF+u1gW/2HnVO7nvIa9PG3Gm+fLHvGI=
github.com/gin-gonic/gin v1.6.2/go.mod h1:75u5sXoLsGZoRN5Sgbi1eraJ4GU3++wFwWzhwvtwp4M=
github.com/gin-gonic/gin v1.7.7 h1:3DoBmSbJbZAWqXJC3SLjAPfutPJJRN1U5pALB7EeTTs=
github.com/gin-gonic/gin v1.7.7/go.mod h1:axIBovoeJpVj8S3BwE0uPMTeReE4+AfFtqpqaZ1qq1U=
github.com/gin-gonic/gin v1.7.4 h1:QmUZXrvJ9qZ3GfWvQ+2wnW/1ePrTEJqPKMYEU3lD/DM=
github.com/gin-gonic/gin v1.7.4/go.mod h1:jD2toBW3GZUr5UMcdrwQA10I7RuaFOl/SGeDjXkfUtY=
github.com/go-asn1-ber/asn1-ber v1.5.1 h1:pDbRAunXzIUXfx4CB2QJFv5IuPiuoW+sWvr/Us009o8=
github.com/go-asn1-ber/asn1-ber v1.5.1/go.mod h1:hEBeB/ic+5LoWskz+yKT7vGhhPYkProFKoKdwZRWMe0=
github.com/go-gl/glfw v0.0.0-20190409004039-e6da0acd62b1/go.mod h1:vR7hzQXu2zJy9AVAgeJqvqgH9Q5CA+iKCZ2gyEVpxRU=
@@ -121,11 +123,12 @@ github.com/go-playground/universal-translator v0.17.0/go.mod h1:UkSxE5sNxxRwHyU+
github.com/go-playground/validator/v10 v10.2.0/go.mod h1:uOYAAleCW8F/7oMFd6aG0GOhaH6EGOAJShg8Id5JGkI=
github.com/go-playground/validator/v10 v10.4.1 h1:pH2c5ADXtd66mxoE0Zm9SUhxE20r7aM3F26W0hOn+GE=
github.com/go-playground/validator/v10 v10.4.1/go.mod h1:nlOn6nFhuKACm19sB/8EGNn9GlaMV7XkbRSipzJ0Ii4=
github.com/go-redis/redis/v9 v9.0.0-rc.1 h1:/+bS+yeUnanqAbuD3QwlejzQZ+4eqgfUtFTG4b+QnXs=
github.com/go-redis/redis/v9 v9.0.0-rc.1/go.mod h1:8et+z03j0l8N+DvsVnclzjf3Dl/pFHgRk+2Ct1qw66A=
github.com/go-redis/redis/v8 v8.11.3 h1:GCjoYp8c+yQTJfc0n69iwSiHjvuAdruxl7elnZCxgt8=
github.com/go-redis/redis/v8 v8.11.3/go.mod h1:xNJ9xDG09FsIPwh3bWdk+0oDWHbtF9rPN0F/oD9XeKc=
github.com/go-sql-driver/mysql v1.6.0 h1:BCTh4TKNUYmOmMUcQ3IipzF5prigylS7XXjEkfCHuOE=
github.com/go-sql-driver/mysql v1.6.0/go.mod h1:DCzpHaOWr8IXmIStZouvnhqoel9Qv2LBy8hT2VhHyBg=
github.com/go-stack/stack v1.8.0/go.mod h1:v0f6uXyyMGvRgIKkXu+yp6POWl0qKG85gN/melR3HDY=
github.com/go-task/slim-sprig v0.0.0-20210107165309-348f09dbbbc0/go.mod h1:fyg7847qk6SyHyPtNmDHnmrv/HOrqktSC+C9fM+CJOE=
github.com/gofrs/uuid v4.0.0+incompatible h1:1SD/1F5pU8p29ybwgQSwpQk+mwdRrXCYuPhW6m+TnJw=
github.com/gofrs/uuid v4.0.0+incompatible/go.mod h1:b2aQJv3Z4Fp6yNu3cdSllBxTCLRxnplIgP/c0N/04lM=
github.com/gogo/protobuf v1.1.1/go.mod h1:r8qH/GZQm5c6nD/R0oafs1akxWv10x8SbQlK7atdtwQ=
@@ -174,7 +177,8 @@ github.com/google/go-cmp v0.5.0/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/
github.com/google/go-cmp v0.5.1/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
github.com/google/go-cmp v0.5.4/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
github.com/google/go-cmp v0.5.5/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
github.com/google/go-cmp v0.5.8 h1:e6P7q2lk1O+qJJb4BtCQXlK8vWEO8V1ZeuEdJNOqZyg=
github.com/google/go-cmp v0.5.6 h1:BKbKCqvP6I+rmFHt06ZmyQtvB8xAkWdhFyr0ZUNZcxQ=
github.com/google/go-cmp v0.5.6/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
github.com/google/gofuzz v1.0.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg=
github.com/google/martian v2.1.0+incompatible/go.mod h1:9I4somxYTbIHy5NJKHRl3wXiIaQGbYVAs8BPL6v8lEs=
github.com/google/martian/v3 v3.0.0/go.mod h1:y5Zk1BBys9G+gd6Jrk0W3cC1+ELVxBWuIGO+w/tUAp0=
@@ -195,6 +199,7 @@ github.com/grpc-ecosystem/grpc-gateway v1.16.0 h1:gmcG1KaJ57LophUzW0Hy8NmPhnMZb4
github.com/grpc-ecosystem/grpc-gateway v1.16.0/go.mod h1:BDjrQk3hbvj6Nolgz8mAMFbcEtjT1g+wF4CSlocrBnw=
github.com/hashicorp/golang-lru v0.5.0/go.mod h1:/m3WP610KZHVQ1SGc6re/UDhFvYD7pJ4Ao+sR/qLZy8=
github.com/hashicorp/golang-lru v0.5.1/go.mod h1:/m3WP610KZHVQ1SGc6re/UDhFvYD7pJ4Ao+sR/qLZy8=
github.com/hpcloud/tail v1.0.0/go.mod h1:ab1qPbhIpdTxEkNHXyeSf5vhxWSCs/tWer42PpOxQnU=
github.com/ianlancetaylor/demangle v0.0.0-20181102032728-5e5cf60278f6/go.mod h1:aSSvb/t6k1mPoxDqO4vJh6VOCGPwU4O0C2/Eqndh1Sc=
github.com/jackc/chunkreader v1.0.0/go.mod h1:RT6O25fNZIuasFJRyZ4R/Y2BbhasbmZXF9QQ7T3kePo=
github.com/jackc/chunkreader/v2 v2.0.0/go.mod h1:odVSm741yZoC3dpHEUXIqA9tQRhFrgOHwnPIn9lDKlk=
@@ -245,8 +250,6 @@ github.com/jinzhu/inflection v1.0.0 h1:K317FqzuhWc8YvSVlFMCCUb36O/S9MCKRDI7QkRKD
github.com/jinzhu/inflection v1.0.0/go.mod h1:h+uFLlag+Qp1Va5pdKtLDYj+kHp5pxUVkryuEj+Srlc=
github.com/jinzhu/now v1.1.2 h1:eVKgfIdy9b6zbWBMgFpfDPoAMifwSZagU9HmEU6zgiI=
github.com/jinzhu/now v1.1.2/go.mod h1:d3SSVoowX0Lcu0IBviAWJpolVfI5UJVZZ7cO71lE/z8=
github.com/josharian/intern v1.0.0 h1:vlS4z54oSdjm0bgjRigI+G1HpF+tI+9rE5LLzOg8HmY=
github.com/josharian/intern v1.0.0/go.mod h1:5DoeVV0s6jJacbCEi61lwdGj/aVlrQvzHFFd8Hwg//Y=
github.com/jpillora/backoff v1.0.0/go.mod h1:J/6gKK9jxlEcS3zixgDgUAsiuZ7yrSoa/FX5e0EB2j4=
github.com/json-iterator/go v1.1.6/go.mod h1:+SdeFBvtyEkXs7REEP0seUULqWtbJapLOCVDaaPEHmU=
github.com/json-iterator/go v1.1.9/go.mod h1:KdQUCv79m/52Kvf8AW2vK1V8akMuk1QjK/uOdHXbAo4=
@@ -279,8 +282,6 @@ github.com/lib/pq v1.1.0/go.mod h1:5WUZQaWbwv1U+lTReE5YruASi9Al49XbQIvNi/34Woo=
github.com/lib/pq v1.2.0/go.mod h1:5WUZQaWbwv1U+lTReE5YruASi9Al49XbQIvNi/34Woo=
github.com/lib/pq v1.10.2 h1:AqzbZs4ZoCBp+GtejcpCpcxM3zlSMx29dXbUSeVtJb8=
github.com/lib/pq v1.10.2/go.mod h1:AlVN5x4E4T544tWzH6hKfbfQvm3HdbOxrmggDNAPY9o=
github.com/mailru/easyjson v0.7.7 h1:UGYAvKxe3sBsEDzO8ZeWOSlIQfWFlxbzLZe7hwFURr0=
github.com/mailru/easyjson v0.7.7/go.mod h1:xzfreul335JAWq5oZzymOObrkdz5UnU4kGfJJLY9Nlc=
github.com/mattn/go-colorable v0.1.1/go.mod h1:FuOcm+DKB9mbwrcAfNl7/TZVBZ6rcnceauSikq3lYCQ=
github.com/mattn/go-colorable v0.1.6/go.mod h1:u6P/XSegPjTcexA+o6vUJrdnUu04hMope9wVRipJSqc=
github.com/mattn/go-isatty v0.0.5/go.mod h1:Iq45c/XA43vh69/j3iqttzPXn0bhXyGjM0Hdxcsrc5s=
@@ -298,9 +299,17 @@ github.com/modern-go/reflect2 v1.0.2 h1:xBagoLtFs94CBntxluKeaWgTMpvLxC4ur3nMaC9G
github.com/modern-go/reflect2 v1.0.2/go.mod h1:yWuevngMOJpCy52FWWMvUC8ws7m/LJsjYzDa0/r8luk=
github.com/mwitkow/go-conntrack v0.0.0-20161129095857-cc309e4a2223/go.mod h1:qRWi+5nqEBWmkhHvq77mSJWrCKwh8bxhgT7d/eI7P4U=
github.com/mwitkow/go-conntrack v0.0.0-20190716064945-2f068394615f/go.mod h1:qRWi+5nqEBWmkhHvq77mSJWrCKwh8bxhgT7d/eI7P4U=
github.com/nxadm/tail v1.4.4/go.mod h1:kenIhsEOeOJmVchQTgglprH7qJGnHDVpk1VPCcaMI8A=
github.com/nxadm/tail v1.4.8 h1:nPr65rt6Y5JFSKQO7qToXr7pePgD6Gwiw05lkbyAQTE=
github.com/onsi/ginkgo v1.16.5 h1:8xi0RTUf59SOSfEtZMvwTvXYMzG4gV23XVHOZiXNtnE=
github.com/onsi/gomega v1.21.1 h1:OB/euWYIExnPBohllTicTHmGTrMaqJ67nIu80j0/uEM=
github.com/nxadm/tail v1.4.8/go.mod h1:+ncqLTQzXmGhMZNUePPaPqPvBxHAIsmXswZKocGu+AU=
github.com/onsi/ginkgo v1.6.0/go.mod h1:lLunBs/Ym6LB5Z9jYTR76FiuTmxDTDusOGeTQH+WWjE=
github.com/onsi/ginkgo v1.12.1/go.mod h1:zj2OWP4+oCPe1qIXoGWkgMRwljMUYCdkwsT2108oapk=
github.com/onsi/ginkgo v1.16.4 h1:29JGrr5oVBm5ulCWet69zQkzWipVXIol6ygQUe/EzNc=
github.com/onsi/ginkgo v1.16.4/go.mod h1:dX+/inL/fNMqNlz0e9LfyB9TswhZpCVdJM/Z6Vvnwo0=
github.com/onsi/gomega v1.7.1/go.mod h1:XdKZgCCFLUoM/7CFJVPcG8C1xQ1AJ0vpAezJrB7JYyY=
github.com/onsi/gomega v1.10.1/go.mod h1:iN09h71vgCQne3DLsj+A5owkum+a2tYe+TOCB1ybHNo=
github.com/onsi/gomega v1.15.0 h1:WjP/FQ/sk43MRmnEcT+MlDw2TFvkrXlprrPST/IudjU=
github.com/onsi/gomega v1.15.0/go.mod h1:cIuvLEne0aoVhAgh/O6ac0Op8WWw9H6eYCriF+tEHG0=
github.com/orcaman/concurrent-map v0.0.0-20210501183033-44dafcb38ecc h1:Ak86L+yDSOzKFa7WM5bf5itSOo1e3Xh8bm5YCMUXIjQ=
github.com/orcaman/concurrent-map v0.0.0-20210501183033-44dafcb38ecc/go.mod h1:Lu3tH6HLW3feq74c2GC+jIMS/K2CFcDWnWD9XkenwhI=
github.com/pkg/errors v0.8.0/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
@@ -364,16 +373,16 @@ github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UV
github.com/stretchr/testify v1.4.0/go.mod h1:j7eGeouHqKxXV5pUuKE4zz7dFj8WfuZ+81PSLYec5m4=
github.com/stretchr/testify v1.5.1/go.mod h1:5W2xD1RspED5o8YsWQXVCued0rvSQ+mT+I5cxcmMvtA=
github.com/stretchr/testify v1.6.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
github.com/stretchr/testify v1.7.0 h1:nwc3DEeHmmLAfoZucVR881uASk0Mfjw8xYJ99tb5CcY=
github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
github.com/stretchr/testify v1.8.0 h1:pSgiaMZlXftHpm5L7V1+rVB+AZJydKsMxsQBIJw4PKk=
github.com/tidwall/gjson v1.14.0 h1:6aeJ0bzojgWLa82gDQHcx3S0Lr/O51I9bJ5nv6JFx5w=
github.com/tidwall/gjson v1.14.0/go.mod h1:/wbyibRr2FHMks5tjHJ5F8dMZh3AcwJEMf5vlfC0lxk=
github.com/tidwall/match v1.1.1 h1:+Ho715JplO36QYgwN9PGYNhgZvoUSc9X2c80KVTi+GA=
github.com/tidwall/match v1.1.1/go.mod h1:eRSPERbgtNPcGhD8UCthc6PmLEQXEWd3PRB5JTxsfmM=
github.com/tidwall/pretty v1.2.0 h1:RWIZEg2iJ8/g6fDDYzMpobmaoGh5OLl4AXtGUGPcqCs=
github.com/tidwall/pretty v1.2.0/go.mod h1:ITEVvHYasfjBbM0u2Pg8T2nJnzm8xPwvNhhsoaGGjNU=
github.com/toolkits/pkg v1.3.3 h1:qpQAQ18Jr47dv4NcBALlH0ad7L2PuqSh5K+nJKNg5lU=
github.com/toolkits/pkg v1.3.3/go.mod h1:USXArTJlz1f1DCnQHNPYugO8GPkr1NRhP4eYQZQVshk=
github.com/toolkits/pkg v1.2.9 h1:zGlrJDl+2sMBoxBRIoMtAwvKmW5wctuji2+qHCecMKk=
github.com/toolkits/pkg v1.2.9/go.mod h1:ZUsQAOoaR99PSbes+RXSirvwmtd6+XIUvizCmrjfUYc=
github.com/ugorji/go v1.1.7/go.mod h1:kZn38zHttfInRq0xu/PH0az30d+z6vm202qpg1oXVMw=
github.com/ugorji/go/codec v1.1.7 h1:2SvQaVZ1ouYrrKKwoSk2pzd4A9evlKJb9oTL+OaLUSs=
github.com/ugorji/go/codec v1.1.7/go.mod h1:Ax+UKWsSmolVDwsd+7N3ZtXu+yMGCf907BLYF3GoBXY=
@@ -383,7 +392,6 @@ github.com/yuin/goldmark v1.1.25/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9de
github.com/yuin/goldmark v1.1.27/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74=
github.com/yuin/goldmark v1.1.32/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74=
github.com/yuin/goldmark v1.2.1/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74=
github.com/yuin/goldmark v1.4.13/go.mod h1:6yULJ656Px+3vBD8DxQVa3kxgyrAnzto9xy5taEt/CY=
github.com/zenazn/goji v0.9.0/go.mod h1:7S9M489iMyHBNxwZnk9/EHS098H4/F6TATF2mIxtB1Q=
go.opencensus.io v0.21.0/go.mod h1:mSImk1erAIZhrmZN+AvHh14ztQfjbGwt4TtuofqLduU=
go.opencensus.io v0.22.0/go.mod h1:+kGneAE2xo2IficOXnaByMWTGM9T73dGwxeWcUqIpI8=
@@ -416,9 +424,8 @@ golang.org/x/crypto v0.0.0-20200622213623-75b288015ac9/go.mod h1:LzIPMQfyMNhhGPh
golang.org/x/crypto v0.0.0-20201203163018-be400aefbc4c/go.mod h1:jdWPYTVW3xRLrWPugEBEK3UY2ZEsg3UU495nc5E+M+I=
golang.org/x/crypto v0.0.0-20210616213533-5ff15b29337e/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc=
golang.org/x/crypto v0.0.0-20210711020723-a769d52b0f97/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc=
golang.org/x/crypto v0.0.0-20210817164053-32db794688a5 h1:HWj/xjIHfjYU5nVXpTM0s39J9CbLn7Cc5a7IC5rwsMQ=
golang.org/x/crypto v0.0.0-20210817164053-32db794688a5/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc=
golang.org/x/crypto v0.0.0-20210921155107-089bfa567519 h1:7I4JAnoQBe7ZtJcBaYHi5UtiO8tQHbUSXxL+pnGRANg=
golang.org/x/crypto v0.0.0-20210921155107-089bfa567519/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc=
golang.org/x/exp v0.0.0-20190121172915-509febef88a4/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA=
golang.org/x/exp v0.0.0-20190306152737-a1d7652674e8/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA=
golang.org/x/exp v0.0.0-20190510132918-efd6b22b2522/go.mod h1:ZjyILWgesfNpC6sMxTJOJm9Kp84zZh5NQWvqDGG3Qr8=
@@ -449,9 +456,9 @@ golang.org/x/mod v0.1.1-0.20191105210325-c90efee705ee/go.mod h1:QqPTAvyqsEbceGzB
golang.org/x/mod v0.1.1-0.20191107180719-034126e5016b/go.mod h1:QqPTAvyqsEbceGzBzNggFXnrqF1CaUcvgkdR5Ot7KZg=
golang.org/x/mod v0.2.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA=
golang.org/x/mod v0.3.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA=
golang.org/x/mod v0.6.0-dev.0.20220419223038-86c51ed26bb4/go.mod h1:jJ57K6gSWd91VN4djpZkiMVwK6gcyfeH4XE8wZrZaV4=
golang.org/x/net v0.0.0-20180724234803-3673e40ba225/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
golang.org/x/net v0.0.0-20180826012351-8a410e7b638d/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
golang.org/x/net v0.0.0-20180906233101-161cd47e91fd/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
golang.org/x/net v0.0.0-20181114220301-adae6a3d119a/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
golang.org/x/net v0.0.0-20190108225652-1e06a53dbb7e/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
golang.org/x/net v0.0.0-20190213061140-3a22650c66bd/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
@@ -475,6 +482,7 @@ golang.org/x/net v0.0.0-20200324143707-d3edc9973b7e/go.mod h1:qpuaurCH72eLCgpAm/
golang.org/x/net v0.0.0-20200501053045-e0ff5e5a1de5/go.mod h1:qpuaurCH72eLCgpAm/N6yyVIVM9cpaDIP3A8BGJEC5A=
golang.org/x/net v0.0.0-20200506145744-7e3656a0809f/go.mod h1:qpuaurCH72eLCgpAm/N6yyVIVM9cpaDIP3A8BGJEC5A=
golang.org/x/net v0.0.0-20200513185701-a91f0712d120/go.mod h1:qpuaurCH72eLCgpAm/N6yyVIVM9cpaDIP3A8BGJEC5A=
golang.org/x/net v0.0.0-20200520004742-59133d7f0dd7/go.mod h1:qpuaurCH72eLCgpAm/N6yyVIVM9cpaDIP3A8BGJEC5A=
golang.org/x/net v0.0.0-20200520182314-0ba52f642ac2/go.mod h1:qpuaurCH72eLCgpAm/N6yyVIVM9cpaDIP3A8BGJEC5A=
golang.org/x/net v0.0.0-20200625001655-4c5254603344/go.mod h1:/O7V0waA8r7cgGh81Ro3o1hOxt32SMVPicZroKQ2sZA=
golang.org/x/net v0.0.0-20200707034311-ab3426394381/go.mod h1:/O7V0waA8r7cgGh81Ro3o1hOxt32SMVPicZroKQ2sZA=
@@ -482,9 +490,10 @@ golang.org/x/net v0.0.0-20200822124328-c89045814202/go.mod h1:/O7V0waA8r7cgGh81R
golang.org/x/net v0.0.0-20201021035429-f5854403a974/go.mod h1:sp8m0HH+o8qH0wwXwYZr8TS3Oi6o0r6Gce1SSxlDquU=
golang.org/x/net v0.0.0-20210226172049-e18ecbb05110/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg=
golang.org/x/net v0.0.0-20210405180319-a5a99cb37ef4/go.mod h1:p54w0d4576C0XHj96bSt6lcn1PtDYWL6XObtHCRCNQM=
golang.org/x/net v0.0.0-20210428140749-89ef3d95e781/go.mod h1:OJAsFXCWl8Ukc7SiCT/9KSuxbyM7479/AVlXFRxuMCk=
golang.org/x/net v0.0.0-20210525063256-abc453219eb5/go.mod h1:9nx3DQGgdP8bBQD5qxJ1jj9UTztislL4KSBs9R2vV5Y=
golang.org/x/net v0.0.0-20220722155237-a158d28d115b h1:PxfKdU9lEEDYjdIzOtC4qFWgkU2rGHdKlKowJSMN9h0=
golang.org/x/net v0.0.0-20220722155237-a158d28d115b/go.mod h1:XRhObCWvk6IyKnWLug+ECip1KBveYUHfp+8e9klMJ9c=
golang.org/x/net v0.0.0-20210805182204-aaa1db679c0d h1:20cMwl2fHAzkJMEA+8J4JgqBQcQGzbisXo31MIeenXI=
golang.org/x/net v0.0.0-20210805182204-aaa1db679c0d/go.mod h1:9nx3DQGgdP8bBQD5qxJ1jj9UTztislL4KSBs9R2vV5Y=
golang.org/x/oauth2 v0.0.0-20180821212333-d2e6202438be/go.mod h1:N/0e6XlmueqKjAGxoOufVs8QHGRruUQn6yWY3a++T0U=
golang.org/x/oauth2 v0.0.0-20190226205417-e64efc72b421/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw=
golang.org/x/oauth2 v0.0.0-20190604053449-0f29369cfe45/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw=
@@ -502,9 +511,9 @@ golang.org/x/sync v0.0.0-20200317015054-43a5402ce75a/go.mod h1:RxMgew5VJxzue5/jJ
golang.org/x/sync v0.0.0-20200625203802-6e8e738ad208/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20201020160332-67f06af15bc9/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20201207232520-09787c993a3a/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20220722155255-886fb9371eb4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sys v0.0.0-20180830151530-49385e6e1522/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20180905080454-ebe1bf3edb33/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20180909124046-d0be0721c37e/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20181116152217-5ac8a444bdc5/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20190222072716-a9d3bda3a223/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
@@ -518,8 +527,11 @@ golang.org/x/sys v0.0.0-20190606165138-5da285871e9c/go.mod h1:h1NjWce9XRLGQEsW7w
golang.org/x/sys v0.0.0-20190624142023-c5567b49c5d0/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20190726091711-fc99dfbffb4e/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20190813064441-fde4db37ae7a/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20190904154756-749cb33beabd/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20191001151750-bb3f8db39f24/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20191005200804-aed5e4c7ecf9/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20191026070338-33540a1f6037/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20191120155948-bd437916bb0e/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20191204072324-ce4227a45e2e/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20191228213918-04cbcbbfeed8/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20200106162015-b016eb3dc98e/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
@@ -541,19 +553,17 @@ golang.org/x/sys v0.0.0-20200625212154-ddb9806d33ae/go.mod h1:h1NjWce9XRLGQEsW7w
golang.org/x/sys v0.0.0-20200803210538-64077c9b5642/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20200930185726-fdedc70b468f/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20210112080510-489259a85091/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20210124154548-22da62e12c0c/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20210330210617-4fbd30eecc44/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20210423082822-04245dca01da/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20210510120138-977fb7262007/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20210603081109-ebe580a85c40/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20210615035016-665e8c7367d1/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20220114195835-da31bd327af9 h1:XfKQ4OlFl8okEOr5UvAqFRVj8pY/4yfcXrddB8qAbU0=
golang.org/x/sys v0.0.0-20220114195835-da31bd327af9/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20220520151302-bc2c85ada10a/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20220722155257-8c9f86f7a55f h1:v4INt8xihDGvnrfjMDVXGxw9wrfxYyCjk0KbXjhR55s=
golang.org/x/sys v0.0.0-20220722155257-8c9f86f7a55f/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/term v0.0.0-20201117132131-f5c789dd3221/go.mod h1:Nr5EML6q2oocZ2LXRh80K7BxOlk5/8JxuGnuhpl+muw=
golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
golang.org/x/term v0.0.0-20210927222741-03fcf44c2211/go.mod h1:jbD1KX2456YbFQfuXm/mYQcufACuNUgVhRMnK/tPxf8=
golang.org/x/text v0.0.0-20170915032832-14c0d48ead0c/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.1-0.20180807135948-17ff2d5776d2/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
@@ -562,9 +572,8 @@ golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
golang.org/x/text v0.3.4/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
golang.org/x/text v0.3.5/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
golang.org/x/text v0.3.6/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
golang.org/x/text v0.3.7 h1:olpwvP2KacW1ZWvsR7uQhoyTYvKAupfQrRGBFM352Gk=
golang.org/x/text v0.3.7/go.mod h1:u+2+/6zg+i71rQMx5EYifcz6MCKuco9NR6JIITiCfzQ=
golang.org/x/text v0.3.8 h1:nAL+RVCQ9uMn3vJZbV+MRnydTJFPf8qqY42YiA6MrqY=
golang.org/x/text v0.3.8/go.mod h1:E6s5w1FMmriuDzIBO73fBruAKo1PCIq6d2Q6DHfQ8WQ=
golang.org/x/time v0.0.0-20181108054448-85acf8d2951c/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ=
golang.org/x/time v0.0.0-20190308202827-9d24e82272b4/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ=
golang.org/x/time v0.0.0-20191024005414-555d28b269f0/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ=
@@ -614,13 +623,14 @@ golang.org/x/tools v0.0.0-20200619180055-7c47624df98f/go.mod h1:EkVYQZoAsY45+roY
golang.org/x/tools v0.0.0-20200729194436-6467de6f59a7/go.mod h1:njjCfa9FT2d7l9Bc6FUM5FLjQPp3cFF28FI3qnDFljA=
golang.org/x/tools v0.0.0-20200804011535-6c149bb5ef0d/go.mod h1:njjCfa9FT2d7l9Bc6FUM5FLjQPp3cFF28FI3qnDFljA=
golang.org/x/tools v0.0.0-20200825202427-b303f430e36d/go.mod h1:njjCfa9FT2d7l9Bc6FUM5FLjQPp3cFF28FI3qnDFljA=
golang.org/x/tools v0.0.0-20201224043029-2b0845dc783e/go.mod h1:emZCQorbCU4vsT4fOWvOPXz4eW1wZW4PmDk9uLelYpA=
golang.org/x/tools v0.0.0-20210106214847-113979e3529a/go.mod h1:emZCQorbCU4vsT4fOWvOPXz4eW1wZW4PmDk9uLelYpA=
golang.org/x/tools v0.1.12/go.mod h1:hNGJHUnrk76NpqgfD5Aqm5Crs+Hm0VOH/i9J2+nxYbc=
golang.org/x/xerrors v0.0.0-20190410155217-1f06c39b4373/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
golang.org/x/xerrors v0.0.0-20190513163551-3ee3066db522/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
golang.org/x/xerrors v0.0.0-20191011141410-1b5146add898/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1 h1:go1bK/D/BFZV2I8cIQd1NKEZ+0owSTG1fDTci4IqFcE=
golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
google.golang.org/api v0.4.0/go.mod h1:8k5glujaEP+g9n7WNsDg8QP6cUVNI86fCNMcbazEtwE=
google.golang.org/api v0.7.0/go.mod h1:WtwebWUNSVBH/HAw79HIFXZNqEvBhG+Ra+ax0hx3E3M=
@@ -716,12 +726,14 @@ gopkg.in/check.v1 v1.0.0-20180628173108-788fd7840127/go.mod h1:Co6ibVJAznAaIkqp8
gopkg.in/check.v1 v1.0.0-20190902080502-41f04d3bba15 h1:YR8cESwS4TdDjEe65xsg0ogRM/Nc3DYOhEAlW+xobZo=
gopkg.in/check.v1 v1.0.0-20190902080502-41f04d3bba15/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/errgo.v2 v2.1.0/go.mod h1:hNsd1EY+bozCKY1Ytp96fpM3vjJbqLJn88ws8XvfDNI=
gopkg.in/fsnotify.v1 v1.4.7/go.mod h1:Tz8NjZHkW78fSQdbUxIjBTcgA1z1m8ZHf0WmKUhAMys=
gopkg.in/gomail.v2 v2.0.0-20160411212932-81ebce5c23df h1:n7WqCuqOuCbNr617RXOY0AWRXxgwEyPp2z+p0+hgMuE=
gopkg.in/gomail.v2 v2.0.0-20160411212932-81ebce5c23df/go.mod h1:LRQQ+SO6ZHR7tOkpBDuZnXENFzX8qRjMDMyPD6BRkCw=
gopkg.in/inconshreveable/log15.v2 v2.0.0-20180818164646-67afb5ed74ec/go.mod h1:aPpfJ7XW+gOuirDoZ8gHhLh3kZ1B08FtV2bbmy7Jv3s=
gopkg.in/square/go-jose.v2 v2.6.0 h1:NGk74WTnPKBNUhNzQX7PYcTLUjoq7mzKk2OKbvwk2iI=
gopkg.in/square/go-jose.v2 v2.6.0/go.mod h1:M9dMgbHiYLoDGQrXy7OpJDJWiKiU//h+vD76mk0e1AI=
gopkg.in/tomb.v1 v1.0.0-20141024135613-dd632973f1e7 h1:uRGJdciOHaEIrze2W8Q3AKkepLTh2hOroT7a+7czfdQ=
gopkg.in/tomb.v1 v1.0.0-20141024135613-dd632973f1e7/go.mod h1:dt/ZhP58zS4L8KSrWDmTeBkI65Dw0HsyUHuEVlX15mw=
gopkg.in/yaml.v2 v2.2.1/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
gopkg.in/yaml.v2 v2.2.2/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
gopkg.in/yaml.v2 v2.2.3/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
@@ -731,8 +743,8 @@ gopkg.in/yaml.v2 v2.2.8/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
gopkg.in/yaml.v2 v2.3.0/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
gopkg.in/yaml.v2 v2.4.0 h1:D8xgwECY7CYvx+Y2n4sBz93Jn9JRvxdiyyo8CTfuKaY=
gopkg.in/yaml.v2 v2.4.0/go.mod h1:RDklbk79AGWmwhnvt/jBztapEOGDOx6ZbXqjP6csGnQ=
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c h1:dUUwHk2QECo/6vqA44rthZ8ie2QXMNeKRTHCNY2nXvo=
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gorm.io/driver/mysql v1.1.2 h1:OofcyE2lga734MxwcCW9uB4mWNXMr50uaGRVwQL2B0M=
gorm.io/driver/mysql v1.1.2/go.mod h1:4P/X9vSc3WTrhTLZ259cpFd6xKNYiSSdSZngkSBGIMM=
gorm.io/driver/postgres v1.1.1 h1:tWLmqYCyaoh89fi7DhM6QggujrOnmfo3H98AzgNAAu0=

View File

@@ -34,11 +34,6 @@ func newWebapiCmd() *cli.Command {
Aliases: []string{"c"},
Usage: "specify configuration file(.json,.yaml,.toml)",
},
&cli.StringFlag{
Name: "key",
Aliases: []string{"k"},
Usage: "specify the secret key for configuration file field encryption",
},
},
Action: func(c *cli.Context) error {
printEnv()
@@ -48,9 +43,6 @@ func newWebapiCmd() *cli.Command {
opts = append(opts, webapi.SetConfigFile(c.String("conf")))
}
opts = append(opts, webapi.SetVersion(version.VERSION))
if c.String("key") != "" {
opts = append(opts, webapi.SetKey(c.String("key")))
}
webapi.Run(opts...)
return nil
@@ -68,11 +60,6 @@ func newServerCmd() *cli.Command {
Aliases: []string{"c"},
Usage: "specify configuration file(.json,.yaml,.toml)",
},
&cli.StringFlag{
Name: "key",
Aliases: []string{"k"},
Usage: "specify the secret key for configuration file field encryption",
},
},
Action: func(c *cli.Context) error {
printEnv()
@@ -82,9 +69,6 @@ func newServerCmd() *cli.Command {
opts = append(opts, server.SetConfigFile(c.String("conf")))
}
opts = append(opts, server.SetVersion(version.VERSION))
if c.String("key") != "" {
opts = append(opts, server.SetKey(c.String("key")))
}
server.Run(opts...)
return nil

View File

@@ -3,16 +3,15 @@ package models
import (
"bytes"
"fmt"
"html/template"
"strconv"
"strings"
"text/template"
"github.com/didi/nightingale/v5/src/pkg/tplx"
)
type AlertCurEvent struct {
Id int64 `json:"id" gorm:"primaryKey"`
Cate string `json:"cate"`
Cluster string `json:"cluster"`
GroupId int64 `json:"group_id"` // busi group id
GroupName string `json:"group_name"` // busi group name
@@ -47,7 +46,6 @@ type AlertCurEvent struct {
LastEvalTime int64 `json:"last_eval_time" gorm:"-"` // for notify.py 上次计算的时间
LastSentTime int64 `json:"last_sent_time" gorm:"-"` // 上次发送时间
NotifyCurNumber int `json:"notify_cur_number"` // notify: current number
FirstTriggerTime int64 `json:"first_trigger_time"` // 连续告警的首次告警时间
}
func (e *AlertCurEvent) TableName() string {
@@ -63,11 +61,10 @@ type AggrRule struct {
Value string
}
func (e *AlertCurEvent) ParseRule(field string) error {
f := e.GetField(field)
f = strings.TrimSpace(f)
func (e *AlertCurEvent) ParseRuleNote() error {
e.RuleNote = strings.TrimSpace(e.RuleNote)
if f == "" {
if e.RuleNote == "" {
return nil
}
@@ -76,8 +73,8 @@ func (e *AlertCurEvent) ParseRule(field string) error {
"{{$value := .TriggerValue}}",
}
text := strings.Join(append(defs, f), "")
t, err := template.New(fmt.Sprint(e.RuleId)).Funcs(template.FuncMap(tplx.TemplateFuncMap)).Parse(text)
text := strings.Join(append(defs, e.RuleNote), "")
t, err := template.New(fmt.Sprint(e.RuleId)).Funcs(tplx.TemplateFuncMap).Parse(text)
if err != nil {
return err
}
@@ -88,13 +85,7 @@ func (e *AlertCurEvent) ParseRule(field string) error {
return err
}
if field == "rule_name" {
e.RuleName = body.String()
}
if field == "rule_note" {
e.RuleNote = body.String()
}
e.RuleNote = body.String()
return nil
}
@@ -140,8 +131,6 @@ func (e *AlertCurEvent) GetField(field string) string {
return fmt.Sprint(e.RuleId)
case "rule_name":
return e.RuleName
case "rule_note":
return e.RuleNote
case "severity":
return fmt.Sprint(e.Severity)
case "runbook_url":
@@ -165,7 +154,6 @@ func (e *AlertCurEvent) ToHis() *AlertHisEvent {
return &AlertHisEvent{
IsRecovered: isRecovered,
Cate: e.Cate,
Cluster: e.Cluster,
GroupId: e.GroupId,
GroupName: e.GroupName,
@@ -192,7 +180,6 @@ func (e *AlertCurEvent) ToHis() *AlertHisEvent {
RecoverTime: recoverTime,
LastEvalTime: e.LastEvalTime,
NotifyCurNumber: e.NotifyCurNumber,
FirstTriggerTime: e.FirstTriggerTime,
}
}
@@ -260,7 +247,7 @@ func (e *AlertCurEvent) FillNotifyGroups(cache map[int64]*UserGroup) error {
return nil
}
func AlertCurEventTotal(prod string, bgid, stime, etime int64, severity int, clusters, cates []string, query string) (int64, error) {
func AlertCurEventTotal(prod string, bgid, stime, etime int64, severity int, clusters []string, query string) (int64, error) {
session := DB().Model(&AlertCurEvent{}).Where("trigger_time between ? and ? and rule_prod = ?", stime, etime, prod)
if bgid > 0 {
@@ -275,10 +262,6 @@ func AlertCurEventTotal(prod string, bgid, stime, etime int64, severity int, clu
session = session.Where("cluster in ?", clusters)
}
if len(cates) > 0 {
session = session.Where("cate in ?", cates)
}
if query != "" {
arr := strings.Fields(query)
for i := 0; i < len(arr); i++ {
@@ -290,7 +273,7 @@ func AlertCurEventTotal(prod string, bgid, stime, etime int64, severity int, clu
return Count(session)
}
func AlertCurEventGets(prod string, bgid, stime, etime int64, severity int, clusters, cates []string, query string, limit, offset int) ([]AlertCurEvent, error) {
func AlertCurEventGets(prod string, bgid, stime, etime int64, severity int, clusters []string, query string, limit, offset int) ([]AlertCurEvent, error) {
session := DB().Where("trigger_time between ? and ? and rule_prod = ?", stime, etime, prod)
if bgid > 0 {
@@ -305,10 +288,6 @@ func AlertCurEventGets(prod string, bgid, stime, etime int64, severity int, clus
session = session.Where("cluster in ?", clusters)
}
if len(cates) > 0 {
session = session.Where("cate in ?", cates)
}
if query != "" {
arr := strings.Fields(query)
for i := 0; i < len(arr); i++ {
@@ -420,9 +399,9 @@ func AlertCurEventGetByIds(ids []int64) ([]*AlertCurEvent, error) {
return lst, err
}
func AlertCurEventGetByRuleIdAndCluster(ruleId int64, cluster string) ([]*AlertCurEvent, error) {
func AlertCurEventGetByRule(ruleId int64) ([]*AlertCurEvent, error) {
var lst []*AlertCurEvent
err := DB().Where("rule_id=? and cluster=?", ruleId, cluster).Find(&lst).Error
err := DB().Where("rule_id=?", ruleId).Find(&lst).Error
return lst, err
}

View File

@@ -7,7 +7,6 @@ import (
type AlertHisEvent struct {
Id int64 `json:"id" gorm:"primaryKey"`
Cate string `json:"cate"`
IsRecovered int `json:"is_recovered"`
Cluster string `json:"cluster"`
GroupId int64 `json:"group_id"`
@@ -39,8 +38,7 @@ type AlertHisEvent struct {
LastEvalTime int64 `json:"last_eval_time"`
Tags string `json:"-"`
TagsJSON []string `json:"tags" gorm:"-"`
NotifyCurNumber int `json:"notify_cur_number"` // notify: current number
FirstTriggerTime int64 `json:"first_trigger_time"` // 连续告警的首次告警时间
NotifyCurNumber int `json:"notify_cur_number"` // notify: current number
}
func (e *AlertHisEvent) TableName() string {
@@ -92,7 +90,7 @@ func (e *AlertHisEvent) FillNotifyGroups(cache map[int64]*UserGroup) error {
return nil
}
func AlertHisEventTotal(prod string, bgid, stime, etime int64, severity int, recovered int, clusters, cates []string, query string) (int64, error) {
func AlertHisEventTotal(prod string, bgid, stime, etime int64, severity int, recovered int, clusters []string, query string) (int64, error) {
session := DB().Model(&AlertHisEvent{}).Where("last_eval_time between ? and ? and rule_prod = ?", stime, etime, prod)
if bgid > 0 {
@@ -111,10 +109,6 @@ func AlertHisEventTotal(prod string, bgid, stime, etime int64, severity int, rec
session = session.Where("cluster in ?", clusters)
}
if len(cates) > 0 {
session = session.Where("cate in ?", cates)
}
if query != "" {
arr := strings.Fields(query)
for i := 0; i < len(arr); i++ {
@@ -126,7 +120,7 @@ func AlertHisEventTotal(prod string, bgid, stime, etime int64, severity int, rec
return Count(session)
}
func AlertHisEventGets(prod string, bgid, stime, etime int64, severity int, recovered int, clusters, cates []string, query string, limit, offset int) ([]AlertHisEvent, error) {
func AlertHisEventGets(prod string, bgid, stime, etime int64, severity int, recovered int, clusters []string, query string, limit, offset int) ([]AlertHisEvent, error) {
session := DB().Where("last_eval_time between ? and ? and rule_prod = ?", stime, etime, prod)
if bgid > 0 {
@@ -145,10 +139,6 @@ func AlertHisEventGets(prod string, bgid, stime, etime int64, severity int, reco
session = session.Where("cluster in ?", clusters)
}
if len(cates) > 0 {
session = session.Where("cate in ?", cates)
}
if query != "" {
arr := strings.Fields(query)
for i := 0; i < len(arr); i++ {

View File

@@ -13,28 +13,23 @@ import (
type TagFilter struct {
Key string `json:"key"` // tag key
Func string `json:"func"` // `==` | `=~` | `in` | `!=` | `!~` | `not in`
Func string `json:"func"` // == | =~ | in
Value string `json:"value"` // tag value
Regexp *regexp.Regexp // parse value to regexp if func = '=~' or '!~'
Vset map[string]struct{} // parse value to regexp if func = 'in' or 'not in'
Regexp *regexp.Regexp // parse value to regexp if func = '=~'
Vset map[string]struct{} // parse value to regexp if func = 'in'
}
type AlertMute struct {
Id int64 `json:"id" gorm:"primaryKey"`
GroupId int64 `json:"group_id"`
Note string `json:"note"`
Cate string `json:"cate"`
Prod string `json:"prod"` // product empty means n9e
Cluster string `json:"cluster"` // take effect by clusters, seperated by space
Tags ormx.JSONArr `json:"tags"`
Cause string `json:"cause"`
Btime int64 `json:"btime"`
Etime int64 `json:"etime"`
Disabled int `json:"disabled"` // 0: enabled, 1: disabled
CreateBy string `json:"create_by"`
UpdateBy string `json:"update_by"`
CreateAt int64 `json:"create_at"`
UpdateAt int64 `json:"update_at"`
ITags []TagFilter `json:"-" gorm:"-"` // inner tags
}
@@ -42,24 +37,6 @@ func (m *AlertMute) TableName() string {
return "alert_mute"
}
func AlertMuteGetById(id int64) (*AlertMute, error) {
return AlertMuteGet("id=?", id)
}
func AlertMuteGet(where string, args ...interface{}) (*AlertMute, error) {
var lst []*AlertMute
err := DB().Where(where, args...).Find(&lst).Error
if err != nil {
return nil, err
}
if len(lst) == 0 {
return nil, nil
}
return lst[0], nil
}
func AlertMuteGets(prods []string, bgid int64, query string) (lst []AlertMute, err error) {
session := DB().Where("group_id = ? and prod in (?)", bgid, prods)
@@ -94,7 +71,7 @@ func (m *AlertMute) Verify() error {
}
if m.Etime <= m.Btime {
return fmt.Errorf("oops... etime(%d) <= btime(%d)", m.Etime, m.Btime)
return fmt.Errorf("Oops... etime(%d) <= btime(%d)", m.Etime, m.Btime)
}
if err := m.Parse(); err != nil {
@@ -136,31 +113,10 @@ func (m *AlertMute) Add() error {
if err := m.Verify(); err != nil {
return err
}
now := time.Now().Unix()
m.CreateAt = now
m.UpdateAt = now
m.CreateAt = time.Now().Unix()
return Insert(m)
}
func (m *AlertMute) Update(arm AlertMute) error {
arm.Id = m.Id
arm.GroupId = m.GroupId
arm.CreateAt = m.CreateAt
arm.CreateBy = m.CreateBy
arm.UpdateAt = time.Now().Unix()
err := arm.Verify()
if err != nil {
return err
}
return DB().Model(m).Select("*").Updates(arm).Error
}
func (m *AlertMute) UpdateFieldsMap(fields map[string]interface{}) error {
return DB().Model(m).Updates(fields).Error
}
func AlertMuteDel(ids []int64) error {
if len(ids) == 0 {
return nil
@@ -169,20 +125,13 @@ func AlertMuteDel(ids []int64) error {
}
func AlertMuteStatistics(cluster string) (*Statistics, error) {
// clean expired first
buf := int64(30)
err := DB().Where("etime < ?", time.Now().Unix()-buf).Delete(new(AlertMute)).Error
if err != nil {
return nil, err
}
session := DB().Model(&AlertMute{}).Select("count(*) as total", "max(update_at) as last_updated")
session := DB().Model(&AlertMute{}).Select("count(*) as total", "max(create_at) as last_updated")
if cluster != "" {
session = session.Where("(cluster like ? or cluster = ?)", "%"+cluster+"%", ClusterAll)
}
var stats []*Statistics
err = session.Find(&stats).Error
err := session.Find(&stats).Error
if err != nil {
return nil, err
}
@@ -191,6 +140,13 @@ func AlertMuteStatistics(cluster string) (*Statistics, error) {
}
func AlertMuteGetsByCluster(cluster string) ([]*AlertMute, error) {
// clean expired first
buf := int64(30)
err := DB().Where("etime < ?", time.Now().Unix()+buf).Delete(new(AlertMute)).Error
if err != nil {
return nil, err
}
// get my cluster's mutes
session := DB().Model(&AlertMute{})
if cluster != "" {
@@ -199,15 +155,10 @@ func AlertMuteGetsByCluster(cluster string) ([]*AlertMute, error) {
var lst []*AlertMute
var mlst []*AlertMute
err := session.Find(&lst).Error
err = session.Find(&lst).Error
if err != nil {
return nil, err
}
if cluster == "" {
return lst, nil
}
for _, m := range lst {
if MatchCluster(m.Cluster, cluster) {
mlst = append(mlst, m)

View File

@@ -14,50 +14,44 @@ import (
)
type AlertRule struct {
Id int64 `json:"id" gorm:"primaryKey"`
GroupId int64 `json:"group_id"` // busi group id
Cate string `json:"cate"` // alert rule cate (prometheus|elasticsearch)
Cluster string `json:"cluster"` // take effect by clusters, seperated by space
Name string `json:"name"` // rule name
Note string `json:"note"` // will sent in notify
Prod string `json:"prod"` // product empty means n9e
Algorithm string `json:"algorithm"` // algorithm (''|holtwinters), empty means threshold
AlgoParams string `json:"-" gorm:"algo_params"` // params algorithm need
AlgoParamsJson interface{} `json:"algo_params" gorm:"-"` //
Delay int `json:"delay"` // Time (in seconds) to delay evaluation
Severity int `json:"severity"` // 1: Emergency 2: Warning 3: Notice
Disabled int `json:"disabled"` // 0: enabled, 1: disabled
PromForDuration int `json:"prom_for_duration"` // prometheus for, unit:s
PromQl string `json:"prom_ql"` // just one ql
PromEvalInterval int `json:"prom_eval_interval"` // unit:s
EnableStime string `json:"-"` // split by space: "00:00 10:00 12:00"
EnableStimeJSON string `json:"enable_stime" gorm:"-"` // for fe
EnableStimesJSON []string `json:"enable_stimes" gorm:"-"` // for fe
EnableEtime string `json:"-"` // split by space: "00:00 10:00 12:00"
EnableEtimeJSON string `json:"enable_etime" gorm:"-"` // for fe
EnableEtimesJSON []string `json:"enable_etimes" gorm:"-"` // for fe
EnableDaysOfWeek string `json:"-"` // eg: "0 1 2 3 4 5 6 ; 0 1 2"
EnableDaysOfWeekJSON []string `json:"enable_days_of_week" gorm:"-"` // for fe
EnableDaysOfWeeksJSON [][]string `json:"enable_days_of_weeks" gorm:"-"` // for fe
EnableInBG int `json:"enable_in_bg"` // 0: global 1: enable one busi-group
NotifyRecovered int `json:"notify_recovered"` // whether notify when recovery
NotifyChannels string `json:"-"` // split by space: sms voice email dingtalk wecom
NotifyChannelsJSON []string `json:"notify_channels" gorm:"-"` // for fe
NotifyGroups string `json:"-"` // split by space: 233 43
NotifyGroupsObj []UserGroup `json:"notify_groups_obj" gorm:"-"` // for fe
NotifyGroupsJSON []string `json:"notify_groups" gorm:"-"` // for fe
NotifyRepeatStep int `json:"notify_repeat_step"` // notify repeat interval, unit: min
NotifyMaxNumber int `json:"notify_max_number"` // notify: max number
RecoverDuration int64 `json:"recover_duration"` // unit: s
Callbacks string `json:"-"` // split by space: http://a.com/api/x http://a.com/api/y'
CallbacksJSON []string `json:"callbacks" gorm:"-"` // for fe
RunbookUrl string `json:"runbook_url"` // sop url
AppendTags string `json:"-"` // split by space: service=n9e mod=api
AppendTagsJSON []string `json:"append_tags" gorm:"-"` // for fe
CreateAt int64 `json:"create_at"`
CreateBy string `json:"create_by"`
UpdateAt int64 `json:"update_at"`
UpdateBy string `json:"update_by"`
Id int64 `json:"id" gorm:"primaryKey"`
GroupId int64 `json:"group_id"` // busi group id
Cluster string `json:"cluster"` // take effect by clusters, seperated by space
Name string `json:"name"` // rule name
Note string `json:"note"` // will sent in notify
Prod string `json:"prod"` // product empty means n9e
Algorithm string `json:"algorithm"` // algorithm (''|holtwinters), empty means threshold
AlgoParams string `json:"-" gorm:"algo_params"` // params algorithm need
AlgoParamsJson interface{} `json:"algo_params" gorm:"-"` //
Delay int `json:"delay"` // Time (in seconds) to delay evaluation
Severity int `json:"severity"` // 1: Emergency 2: Warning 3: Notice
Disabled int `json:"disabled"` // 0: enabled, 1: disabled
PromForDuration int `json:"prom_for_duration"` // prometheus for, unit:s
PromQl string `json:"prom_ql"` // just one ql
PromEvalInterval int `json:"prom_eval_interval"` // unit:s
EnableStime string `json:"enable_stime"` // e.g. 00:00
EnableEtime string `json:"enable_etime"` // e.g. 23:59
EnableDaysOfWeek string `json:"-"` // split by space: 0 1 2 3 4 5 6
EnableDaysOfWeekJSON []string `json:"enable_days_of_week" gorm:"-"` // for fe
EnableInBG int `json:"enable_in_bg"` // 0: global 1: enable one busi-group
NotifyRecovered int `json:"notify_recovered"` // whether notify when recovery
NotifyChannels string `json:"-"` // split by space: sms voice email dingtalk wecom
NotifyChannelsJSON []string `json:"notify_channels" gorm:"-"` // for fe
NotifyGroups string `json:"-"` // split by space: 233 43
NotifyGroupsObj []UserGroup `json:"notify_groups_obj" gorm:"-"` // for fe
NotifyGroupsJSON []string `json:"notify_groups" gorm:"-"` // for fe
NotifyRepeatStep int `json:"notify_repeat_step"` // notify repeat interval, unit: min
NotifyMaxNumber int `json:"notify_max_number"` // notify: max number
RecoverDuration int64 `json:"recover_duration"` // unit: s
Callbacks string `json:"-"` // split by space: http://a.com/api/x http://a.com/api/y'
CallbacksJSON []string `json:"callbacks" gorm:"-"` // for fe
RunbookUrl string `json:"runbook_url"` // sop url
AppendTags string `json:"-"` // split by space: service=n9e mod=api
AppendTagsJSON []string `json:"append_tags" gorm:"-"` // for fe
CreateAt int64 `json:"create_at"`
CreateBy string `json:"create_by"`
UpdateAt int64 `json:"update_at"`
UpdateBy string `json:"update_by"`
}
func (ar *AlertRule) TableName() string {
@@ -229,29 +223,7 @@ func (ar *AlertRule) FillNotifyGroups(cache map[int64]*UserGroup) error {
}
func (ar *AlertRule) FE2DB() error {
if len(ar.EnableStimesJSON) > 0 {
ar.EnableStime = strings.Join(ar.EnableStimesJSON, " ")
ar.EnableEtime = strings.Join(ar.EnableEtimesJSON, " ")
} else {
ar.EnableStime = ar.EnableStimeJSON
ar.EnableEtime = ar.EnableEtimeJSON
}
if len(ar.EnableDaysOfWeeksJSON) > 0 {
for i := 0; i < len(ar.EnableDaysOfWeeksJSON); i++ {
if len(ar.EnableDaysOfWeeksJSON) == 1 {
ar.EnableDaysOfWeek = strings.Join(ar.EnableDaysOfWeeksJSON[i], " ")
} else {
if i == len(ar.EnableDaysOfWeeksJSON)-1 {
ar.EnableDaysOfWeek += strings.Join(ar.EnableDaysOfWeeksJSON[i], " ")
} else {
ar.EnableDaysOfWeek += strings.Join(ar.EnableDaysOfWeeksJSON[i], " ") + ";"
}
}
}
} else {
ar.EnableDaysOfWeek = strings.Join(ar.EnableDaysOfWeekJSON, " ")
}
ar.EnableDaysOfWeek = strings.Join(ar.EnableDaysOfWeekJSON, " ")
ar.NotifyChannels = strings.Join(ar.NotifyChannelsJSON, " ")
ar.NotifyGroups = strings.Join(ar.NotifyGroupsJSON, " ")
ar.Callbacks = strings.Join(ar.CallbacksJSON, " ")
@@ -266,21 +238,7 @@ func (ar *AlertRule) FE2DB() error {
}
func (ar *AlertRule) DB2FE() {
ar.EnableStimesJSON = strings.Fields(ar.EnableStime)
ar.EnableEtimesJSON = strings.Fields(ar.EnableEtime)
if len(ar.EnableEtimesJSON) > 0 {
ar.EnableStimeJSON = ar.EnableStimesJSON[0]
ar.EnableEtimeJSON = ar.EnableEtimesJSON[0]
}
cache := strings.Split(ar.EnableDaysOfWeek, ";")
for i := 0; i < len(cache); i++ {
ar.EnableDaysOfWeeksJSON = append(ar.EnableDaysOfWeeksJSON, strings.Fields(cache[i]))
}
if len(ar.EnableDaysOfWeeksJSON) > 0 {
ar.EnableDaysOfWeekJSON = ar.EnableDaysOfWeeksJSON[0]
}
ar.EnableDaysOfWeekJSON = strings.Fields(ar.EnableDaysOfWeek)
ar.NotifyChannelsJSON = strings.Fields(ar.NotifyChannels)
ar.NotifyGroupsJSON = strings.Fields(ar.NotifyGroups)
ar.CallbacksJSON = strings.Fields(ar.Callbacks)
@@ -378,7 +336,7 @@ func AlertRuleGetsByCluster(cluster string) ([]*AlertRule, error) {
return lr, err
}
func AlertRulesGetsBy(prods []string, query, algorithm, cluster string, cates []string, disabled int) ([]*AlertRule, error) {
func AlertRulesGetsBy(prods []string, query string) ([]*AlertRule, error) {
session := DB().Where("prod in (?)", prods)
if query != "" {
@@ -389,22 +347,6 @@ func AlertRulesGetsBy(prods []string, query, algorithm, cluster string, cates []
}
}
if algorithm != "" {
session = session.Where("algorithm = ?", algorithm)
}
if cluster != "" {
session = session.Where("cluster like ?", "%"+cluster+"%")
}
if len(cates) != 0 {
session = session.Where("cate in (?)", cates)
}
if disabled != -1 {
session = session.Where("disabled = ?", disabled)
}
var lst []*AlertRule
err := session.Find(&lst).Error
if err == nil {
@@ -466,38 +408,3 @@ func AlertRuleStatistics(cluster string) (*Statistics, error) {
return stats[0], nil
}
func (ar *AlertRule) IsPrometheusRule() bool {
return ar.Algorithm == "" && (ar.Cate == "" || strings.ToLower(ar.Cate) == "prometheus")
}
func (ar *AlertRule) GenerateNewEvent() *AlertCurEvent {
event := &AlertCurEvent{}
ar.UpdateEvent(event)
return event
}
func (ar *AlertRule) UpdateEvent(event *AlertCurEvent) {
if event == nil {
return
}
event.GroupId = ar.GroupId
event.Cate = ar.Cate
event.RuleId = ar.Id
event.RuleName = ar.Name
event.RuleNote = ar.Note
event.RuleProd = ar.Prod
event.RuleAlgo = ar.Algorithm
event.Severity = ar.Severity
event.PromForDuration = ar.PromForDuration
event.PromQl = ar.PromQl
event.PromEvalInterval = ar.PromEvalInterval
event.Callbacks = ar.Callbacks
event.CallbacksJSON = ar.CallbacksJSON
event.RunbookUrl = ar.RunbookUrl
event.NotifyRecovered = ar.NotifyRecovered
event.NotifyChannels = ar.NotifyChannels
event.NotifyChannelsJSON = ar.NotifyChannelsJSON
event.NotifyGroups = ar.NotifyGroups
event.NotifyGroupsJSON = ar.NotifyGroupsJSON
}

View File

@@ -13,10 +13,7 @@ import (
type AlertSubscribe struct {
Id int64 `json:"id" gorm:"primaryKey"`
Name string `json:"name"` // AlertSubscribe name
Disabled int `json:"disabled"` // 0: enabled, 1: disabled
GroupId int64 `json:"group_id"`
Cate string `json:"cate"`
Cluster string `json:"cluster"` // take effect by clusters, seperated by space
RuleId int64 `json:"rule_id"`
RuleName string `json:"rule_name" gorm:"-"` // for fe
@@ -57,10 +54,6 @@ func AlertSubscribeGet(where string, args ...interface{}) (*AlertSubscribe, erro
return lst[0], nil
}
func (s *AlertSubscribe) IsDisabled() bool {
return s.Disabled == 1
}
func (s *AlertSubscribe) Verify() error {
if s.Cluster == "" {
return errors.New("cluster invalid")
@@ -95,12 +88,12 @@ func (s *AlertSubscribe) Parse() error {
}
for i := 0; i < len(s.ITags); i++ {
if s.ITags[i].Func == "=~" || s.ITags[i].Func == "!~" {
if s.ITags[i].Func == "=~" {
s.ITags[i].Regexp, err = regexp.Compile(s.ITags[i].Value)
if err != nil {
return err
}
} else if s.ITags[i].Func == "in" || s.ITags[i].Func == "not in" {
} else if s.ITags[i].Func == "in" {
arr := strings.Fields(s.ITags[i].Value)
s.ITags[i].Vset = make(map[string]struct{})
for j := 0; j < len(arr); j++ {
@@ -238,11 +231,6 @@ func AlertSubscribeGetsByCluster(cluster string) ([]*AlertSubscribe, error) {
if err != nil {
return nil, err
}
if cluster == "" {
return lst, nil
}
for _, s := range lst {
if MatchCluster(s.Cluster, cluster) {
slst = append(slst, s)

View File

@@ -1,144 +0,0 @@
package models
import (
"fmt"
"time"
)
type AlertingEngines struct {
Id int64 `json:"id" gorm:"primaryKey"`
Instance string `json:"instance"`
Cluster string `json:"cluster"` // reader cluster
Clock int64 `json:"clock"`
}
func (e *AlertingEngines) TableName() string {
return "alerting_engines"
}
// UpdateCluster 页面上用户会给各个n9e-server分配要关联的目标集群是什么
func (e *AlertingEngines) UpdateCluster(c string) error {
count, err := Count(DB().Model(&AlertingEngines{}).Where("id<>? and instance=? and cluster=?", e.Id, e.Instance, c))
if err != nil {
return err
}
if count > 0 {
return fmt.Errorf("instance %s and cluster %s already exists", e.Instance, c)
}
e.Cluster = c
return DB().Model(e).Select("cluster").Updates(e).Error
}
func AlertingEngineAdd(instance, cluster string) error {
count, err := Count(DB().Model(&AlertingEngines{}).Where("instance=? and cluster=?", instance, cluster))
if err != nil {
return err
}
if count > 0 {
return fmt.Errorf("instance %s and cluster %s already exists", instance, cluster)
}
err = DB().Create(&AlertingEngines{
Instance: instance,
Cluster: cluster,
Clock: time.Now().Unix(),
}).Error
return err
}
func AlertingEngineDel(ids []int64) error {
if len(ids) == 0 {
return nil
}
return DB().Where("id in ?", ids).Delete(new(AlertingEngines)).Error
}
// AlertingEngineGetCluster 根据实例名获取对应的集群名字
func AlertingEngineGetClusters(instance string) ([]string, error) {
var objs []AlertingEngines
err := DB().Where("instance=?", instance).Find(&objs).Error
if err != nil {
return []string{}, err
}
if len(objs) == 0 {
return []string{}, nil
}
var clusters []string
for i := 0; i < len(objs); i++ {
clusters = append(clusters, objs[i].Cluster)
}
return clusters, nil
}
// AlertingEngineGets 拉取列表数据,用户要在页面上看到所有 n9e-server 实例列表,然后为其分配 cluster
func AlertingEngineGets(where string, args ...interface{}) ([]*AlertingEngines, error) {
var objs []*AlertingEngines
var err error
session := DB().Order("instance")
if where == "" {
err = session.Find(&objs).Error
} else {
err = session.Where(where, args...).Find(&objs).Error
}
return objs, err
}
func AlertingEngineGet(where string, args ...interface{}) (*AlertingEngines, error) {
lst, err := AlertingEngineGets(where, args...)
if err != nil {
return nil, err
}
if len(lst) == 0 {
return nil, nil
}
return lst[0], nil
}
func AlertingEngineGetsInstances(where string, args ...interface{}) ([]string, error) {
var arr []string
var err error
session := DB().Model(new(AlertingEngines)).Order("instance")
if where == "" {
err = session.Pluck("instance", &arr).Error
} else {
err = session.Where(where, args...).Pluck("instance", &arr).Error
}
return arr, err
}
func AlertingEngineHeartbeatWithCluster(instance, cluster string) error {
var total int64
err := DB().Model(new(AlertingEngines)).Where("instance=? and cluster=?", instance, cluster).Count(&total).Error
if err != nil {
return err
}
if total == 0 {
// insert
err = DB().Create(&AlertingEngines{
Instance: instance,
Cluster: cluster,
Clock: time.Now().Unix(),
}).Error
} else {
// updates
fields := map[string]interface{}{"clock": time.Now().Unix()}
err = DB().Model(new(AlertingEngines)).Where("instance=? and cluster=?", instance, cluster).Updates(fields).Error
}
return err
}
func AlertingEngineHeartbeat(instance string) error {
fields := map[string]interface{}{"clock": time.Now().Unix()}
err := DB().Model(new(AlertingEngines)).Where("instance=?", instance).Updates(fields).Error
return err
}

View File

@@ -13,14 +13,12 @@ type Board struct {
Id int64 `json:"id" gorm:"primaryKey"`
GroupId int64 `json:"group_id"`
Name string `json:"name"`
Ident string `json:"ident"`
Tags string `json:"tags"`
CreateAt int64 `json:"create_at"`
CreateBy string `json:"create_by"`
UpdateAt int64 `json:"update_at"`
UpdateBy string `json:"update_by"`
Configs string `json:"configs" gorm:"-"`
Public int `json:"public"` // 0: false, 1: true
}
func (b *Board) TableName() string {
@@ -39,36 +37,11 @@ func (b *Board) Verify() error {
return nil
}
func (b *Board) CanRenameIdent(ident string) (bool, error) {
if ident == "" {
return true, nil
}
cnt, err := Count(DB().Model(b).Where("ident=? and id <> ?", ident, b.Id))
if err != nil {
return false, err
}
return cnt == 0, nil
}
func (b *Board) Add() error {
if err := b.Verify(); err != nil {
return err
}
if b.Ident != "" {
// ident duplicate check
cnt, err := Count(DB().Model(b).Where("ident=?", b.Ident))
if err != nil {
return err
}
if cnt > 0 {
return errors.New("Ident duplicate")
}
}
now := time.Now().Unix()
b.CreateAt = now
b.UpdateAt = now
@@ -98,20 +71,6 @@ func (b *Board) Del() error {
})
}
func BoardGetByID(id int64) (*Board, error) {
var lst []*Board
err := DB().Where("id = ?", id).Find(&lst).Error
if err != nil {
return nil, err
}
if len(lst) == 0 {
return nil, nil
}
return lst[0], nil
}
// BoardGet for detail page
func BoardGet(where string, args ...interface{}) (*Board, error) {
var lst []*Board

View File

@@ -119,7 +119,7 @@ func (bg *BusiGroup) Del() error {
return errors.New("Some targets still in the BusiGroup")
}
has, err = Exists(DB().Model(&Board{}).Where("group_id=?", bg.Id))
has, err = Exists(DB().Model(&Dashboard{}).Where("group_id=?", bg.Id))
if err != nil {
return err
}

View File

@@ -74,62 +74,7 @@ func ConfigsSet(ckey, cval string) error {
return err
}
func ConfigGet(id int64) (*Configs, error) {
var objs []*Configs
err := DB().Where("id=?", id).Find(&objs).Error
if len(objs) == 0 {
return nil, nil
}
return objs[0], err
}
func ConfigsGets(prefix string, limit, offset int) ([]*Configs, error) {
var objs []*Configs
session := DB()
if prefix != "" {
session = session.Where("ckey like ?", prefix+"%")
}
err := session.Order("id desc").Limit(limit).Offset(offset).Find(&objs).Error
return objs, err
}
func (c *Configs) Add() error {
num, err := Count(DB().Model(&Configs{}).Where("ckey=?", c.Ckey))
if err != nil {
return errors.WithMessage(err, "failed to count configs")
}
if num > 0 {
return errors.WithMessage(err, "key is exists")
}
// insert
err = DB().Create(&Configs{
Ckey: c.Ckey,
Cval: c.Cval,
}).Error
return err
}
func (c *Configs) Update() error {
num, err := Count(DB().Model(&Configs{}).Where("id<>? and ckey=?", c.Id, c.Ckey))
if err != nil {
return errors.WithMessage(err, "failed to count configs")
}
if num > 0 {
return errors.WithMessage(err, "key is exists")
}
err = DB().Model(&Configs{}).Where("id=?", c.Id).Updates(c).Error
return err
}
func ConfigsDel(ids []int64) error {
return DB().Where("id in ?", ids).Delete(&Configs{}).Error
}
func ConfigsGetsByKey(ckeys []string) (map[string]string, error) {
func ConfigsGets(ckeys []string) (map[string]string, error) {
var objs []Configs
err := DB().Where("ckey in ?", ckeys).Find(&objs).Error
if err != nil {

View File

@@ -83,15 +83,14 @@ func (re *RecordingRule) Add() error {
return err
}
// 由于实际场景中会出现name重复的recording rule所以不需要检查重复
//exists, err := RecordingRuleExists(0, re.GroupId, re.Cluster, re.Name)
//if err != nil {
// return err
//}
//
//if exists {
// return errors.New("RecordingRule already exists")
//}
exists, err := RecordingRuleExists(0, re.GroupId, re.Cluster, re.Name)
if err != nil {
return err
}
if exists {
return errors.New("RecordingRule already exists")
}
now := time.Now().Unix()
re.CreateAt = now
@@ -101,16 +100,15 @@ func (re *RecordingRule) Add() error {
}
func (re *RecordingRule) Update(ref RecordingRule) error {
// 由于实际场景中会出现name重复的recording rule所以不需要检查重复
//if re.Name != ref.Name {
// exists, err := RecordingRuleExists(re.Id, re.GroupId, re.Cluster, ref.Name)
// if err != nil {
// return err
// }
// if exists {
// return errors.New("RecordingRule already exists")
// }
//}
if re.Name != ref.Name {
exists, err := RecordingRuleExists(re.Id, re.GroupId, re.Cluster, ref.Name)
if err != nil {
return err
}
if exists {
return errors.New("RecordingRule already exists")
}
}
ref.FE2DB()
ref.Id = re.Id

View File

@@ -1,198 +0,0 @@
package models
import (
"crypto/md5"
"fmt"
"regexp"
"sort"
"strings"
"github.com/prometheus/common/model"
"github.com/prometheus/prometheus/prompb"
)
const (
Replace Action = "replace"
Keep Action = "keep"
Drop Action = "drop"
HashMod Action = "hashmod"
LabelMap Action = "labelmap"
LabelDrop Action = "labeldrop"
LabelKeep Action = "labelkeep"
Lowercase Action = "lowercase"
Uppercase Action = "uppercase"
)
type Action string
type Regexp struct {
*regexp.Regexp
}
type RelabelConfig struct {
SourceLabels model.LabelNames
Separator string
Regex interface{}
Modulus uint64
TargetLabel string
Replacement string
Action Action
}
func Process(labels []*prompb.Label, cfgs ...*RelabelConfig) []*prompb.Label {
for _, cfg := range cfgs {
labels = relabel(labels, cfg)
if labels == nil {
return nil
}
}
return labels
}
func getValue(ls []*prompb.Label, name model.LabelName) string {
for _, l := range ls {
if l.Name == string(name) {
return l.Value
}
}
return ""
}
type LabelBuilder struct {
LabelSet map[string]string
}
func newBuilder(ls []*prompb.Label) *LabelBuilder {
lset := make(map[string]string, len(ls))
for _, l := range ls {
lset[l.Name] = l.Value
}
return &LabelBuilder{LabelSet: lset}
}
func (l *LabelBuilder) set(k, v string) *LabelBuilder {
if v == "" {
return l.del(k)
}
l.LabelSet[k] = v
return l
}
func (l *LabelBuilder) del(ns ...string) *LabelBuilder {
for _, n := range ns {
delete(l.LabelSet, n)
}
return l
}
func (l *LabelBuilder) labels() []*prompb.Label {
ls := make([]*prompb.Label, 0, len(l.LabelSet))
if len(l.LabelSet) == 0 {
return ls
}
for k, v := range l.LabelSet {
ls = append(ls, &prompb.Label{
Name: k,
Value: v,
})
}
sort.Slice(ls, func(i, j int) bool {
return ls[i].Name > ls[j].Name
})
return ls
}
func relabel(lset []*prompb.Label, cfg *RelabelConfig) []*prompb.Label {
values := make([]string, 0, len(cfg.SourceLabels))
for _, ln := range cfg.SourceLabels {
values = append(values, getValue(lset, ln))
}
regx := cfg.Regex.(Regexp)
val := strings.Join(values, cfg.Separator)
lb := newBuilder(lset)
switch cfg.Action {
case Drop:
if regx.MatchString(val) {
return nil
}
case Keep:
if !regx.MatchString(val) {
return nil
}
case Replace:
indexes := regx.FindStringSubmatchIndex(val)
if indexes == nil {
break
}
target := model.LabelName(regx.ExpandString([]byte{}, cfg.TargetLabel, val, indexes))
if !target.IsValid() {
lb.del(cfg.TargetLabel)
break
}
res := regx.ExpandString([]byte{}, cfg.Replacement, val, indexes)
if len(res) == 0 {
lb.del(cfg.TargetLabel)
break
}
lb.set(string(target), string(res))
case Lowercase:
lb.set(cfg.TargetLabel, strings.ToLower(val))
case Uppercase:
lb.set(cfg.TargetLabel, strings.ToUpper(val))
case HashMod:
mod := sum64(md5.Sum([]byte(val))) % cfg.Modulus
lb.set(cfg.TargetLabel, fmt.Sprintf("%d", mod))
case LabelMap:
for _, l := range lset {
if regx.MatchString(l.Name) {
res := regx.ReplaceAllString(l.Name, cfg.Replacement)
lb.set(res, l.Value)
}
}
case LabelDrop:
for _, l := range lset {
if regx.MatchString(l.Name) {
lb.del(l.Name)
}
}
case LabelKeep:
for _, l := range lset {
if !regx.MatchString(l.Name) {
lb.del(l.Name)
}
}
default:
panic(fmt.Errorf("relabel: unknown relabel action type %q", cfg.Action))
}
return lb.labels()
}
func sum64(hash [md5.Size]byte) uint64 {
var s uint64
for i, b := range hash {
shift := uint64((md5.Size - i - 1) * 8)
s |= uint64(b) << shift
}
return s
}
func NewRegexp(s string) (Regexp, error) {
regex, err := regexp.Compile("^(?:" + s + ")$")
return Regexp{Regexp: regex}, err
}
func MustNewRegexp(s string) Regexp {
re, err := NewRegexp(s)
if err != nil {
panic(err)
}
return re
}

View File

@@ -20,11 +20,6 @@ type Target struct {
TagsJSON []string `json:"tags" gorm:"-"`
TagsMap map[string]string `json:"-" gorm:"-"` // internal use, append tags to series
UpdateAt int64 `json:"update_at"`
TargetUp float64 `json:"target_up" gorm:"-"`
LoadPerCore float64 `json:"load_per_core" gorm:"-"`
MemUtil float64 `json:"mem_util" gorm:"-"`
DiskUtil float64 `json:"disk_util" gorm:"-"`
}
func (t *Target) TableName() string {
@@ -116,10 +111,6 @@ func buildTargetWhere(bgid int64, clusters []string, query string) *gorm.DB {
return session
}
func TargetTotalCount() (int64, error) {
return Count(DB().Model(new(Target)))
}
func TargetTotal(bgid int64, clusters []string, query string) (int64, error) {
return Count(buildTargetWhere(bgid, clusters, query))
}

View File

@@ -95,7 +95,7 @@ func (u *User) Update(selectField interface{}, selectFields ...interface{}) erro
return err
}
return DB().Model(u).Select(selectField, selectFields...).Updates(u).Error
return DB().Model(u).Select(selectField, selectFields).Updates(u).Error
}
func (u *User) UpdateAllFields() error {
@@ -450,25 +450,6 @@ func (u *User) BusiGroups(limit int, query string, all ...bool) ([]BusiGroup, er
var lst []BusiGroup
if u.IsAdmin() || (len(all) > 0 && all[0]) {
err := session.Where("name like ?", "%"+query+"%").Find(&lst).Error
if err != nil {
return lst, err
}
if len(lst) == 0 && len(query) > 0 {
// 隐藏功能一般人不告诉哈哈。query可能是给的ident所以上面的sql没有查到当做ident来查一下试试
var t *Target
t, err = TargetGet("ident=?", query)
if err != nil {
return lst, err
}
if t == nil {
return lst, nil
}
err = DB().Order("name").Limit(limit).Where("id=?", t.GroupId).Find(&lst).Error
}
return lst, err
}
@@ -487,22 +468,6 @@ func (u *User) BusiGroups(limit int, query string, all ...bool) ([]BusiGroup, er
}
err = session.Where("id in ?", busiGroupIds).Where("name like ?", "%"+query+"%").Find(&lst).Error
if err != nil {
return nil, err
}
if len(lst) == 0 && len(query) > 0 {
var t *Target
t, err = TargetGet("ident=?", query)
if err != nil {
return lst, err
}
if slice.ContainsInt64(busiGroupIds, t.GroupId) {
err = DB().Order("name").Limit(limit).Where("id=?", t.GroupId).Find(&lst).Error
}
}
return lst, err
}
@@ -512,23 +477,6 @@ func (u *User) UserGroups(limit int, query string) ([]UserGroup, error) {
var lst []UserGroup
if u.IsAdmin() {
err := session.Where("name like ?", "%"+query+"%").Find(&lst).Error
if err != nil {
return lst, err
}
if len(lst) == 0 && len(query) > 0 {
// 隐藏功能一般人不告诉哈哈。query可能是给的用户名所以上面的sql没有查到当做user来查一下试试
user, err := UserGetByUsername(query)
if user == nil {
return lst, err
}
var ids []int64
ids, err = MyGroupIds(user.Id)
if err != nil || len(ids) == 0 {
return lst, err
}
lst, err = UserGroupGetByIds(ids)
}
return lst, err
}

View File

@@ -1,150 +0,0 @@
package cas
import (
"bytes"
"context"
"net/url"
"strings"
"time"
"github.com/didi/nightingale/v5/src/storage"
"github.com/google/uuid"
"github.com/toolkits/pkg/cas"
"github.com/toolkits/pkg/logger"
)
type Config struct {
Enable bool
SsoAddr string
RedirectURL string
DisplayName string
CoverAttributes bool
Attributes struct {
Nickname string
Phone string
Email string
}
DefaultRoles []string
}
type ssoClient struct {
config Config
ssoAddr string
callbackAddr string
displayName string
attributes struct {
nickname string
phone string
email string
}
}
var (
cli ssoClient
)
func Init(cf Config) {
if !cf.Enable {
return
}
cli = ssoClient{}
cli.config = cf
cli.ssoAddr = cf.SsoAddr
cli.callbackAddr = cf.RedirectURL
cli.displayName = cf.DisplayName
cli.attributes.nickname = cf.Attributes.Nickname
cli.attributes.phone = cf.Attributes.Phone
cli.attributes.email = cf.Attributes.Email
}
func GetDisplayName() string {
return cli.displayName
}
// Authorize return the cas authorize location and state
func Authorize(redirect string) (string, string, error) {
state := uuid.New().String()
ctx := context.Background()
err := storage.Redis.Set(ctx, wrapStateKey(state), redirect, time.Duration(300*time.Second)).Err()
if err != nil {
return "", "", err
}
return cli.genRedirectURL(state), state, nil
}
func fetchRedirect(ctx context.Context, state string) (string, error) {
return storage.Redis.Get(ctx, wrapStateKey(state)).Result()
}
func deleteRedirect(ctx context.Context, state string) error {
return storage.Redis.Del(ctx, wrapStateKey(state)).Err()
}
func wrapStateKey(key string) string {
return "n9e_cas_" + key
}
func (cli *ssoClient) genRedirectURL(state string) string {
var buf bytes.Buffer
buf.WriteString(cli.ssoAddr + "login")
v := url.Values{
"service": {cli.callbackAddr},
}
if strings.Contains(cli.ssoAddr, "?") {
buf.WriteByte('&')
} else {
buf.WriteByte('?')
}
buf.WriteString(v.Encode())
return buf.String()
}
type CallbackOutput struct {
Redirect string `json:"redirect"`
Msg string `json:"msg"`
AccessToken string `json:"accessToken"`
Username string `json:"username"`
Nickname string `json:"nickname"`
Phone string `yaml:"phone"`
Email string `yaml:"email"`
}
func ValidateServiceTicket(ctx context.Context, ticket, state string) (ret *CallbackOutput, err error) {
casUrl, err := url.Parse(cli.config.SsoAddr)
if err != nil {
logger.Error(err)
return
}
serviceUrl, err := url.Parse(cli.callbackAddr)
if err != nil {
logger.Error(err)
return
}
resOptions := &cas.RestOptions{
CasURL: casUrl,
ServiceURL: serviceUrl,
}
resCli := cas.NewRestClient(resOptions)
authRet, err := resCli.ValidateServiceTicket(cas.ServiceTicket(ticket))
if err != nil {
logger.Errorf("Ticket Validating Failed: %s", err)
return
}
ret = &CallbackOutput{}
ret.Username = authRet.User
ret.Nickname = authRet.Attributes.Get(cli.attributes.nickname)
logger.Debugf("CAS Authentication Response's Attributes--[Nickname]: %s", ret.Nickname)
ret.Email = authRet.Attributes.Get(cli.attributes.email)
logger.Debugf("CAS Authentication Response's Attributes--[Email]: %s", ret.Email)
ret.Phone = authRet.Attributes.Get(cli.attributes.phone)
logger.Debugf("CAS Authentication Response's Attributes--[Phone]: %s", ret.Phone)
ret.Redirect, err = fetchRedirect(ctx, state)
if err != nil {
logger.Debugf("get redirect err:%s state:%s", state, err)
}
err = deleteRedirect(ctx, state)
if err != nil {
logger.Debugf("delete redirect err:%s state:%s", state, err)
}
return
}

View File

@@ -1,225 +0,0 @@
package oauth2x
import (
"bytes"
"context"
"fmt"
"io/ioutil"
"net/http"
"time"
"github.com/didi/nightingale/v5/src/storage"
"github.com/toolkits/pkg/logger"
"github.com/google/uuid"
jsoniter "github.com/json-iterator/go"
"golang.org/x/oauth2"
)
type ssoClient struct {
config oauth2.Config
ssoAddr string
userInfoAddr string
TranTokenMethod string
callbackAddr string
displayName string
coverAttributes bool
attributes struct {
username string
nickname string
phone string
email string
}
userinfoIsArray bool
userinfoPrefix string
}
type Config struct {
Enable bool
DisplayName string
RedirectURL string
SsoAddr string
TokenAddr string
UserInfoAddr string
TranTokenMethod string
ClientId string
ClientSecret string
CoverAttributes bool
Attributes struct {
Username string
Nickname string
Phone string
Email string
}
DefaultRoles []string
UserinfoIsArray bool
UserinfoPrefix string
Scopes []string
}
var (
cli ssoClient
)
func Init(cf Config) {
if !cf.Enable {
return
}
cli.ssoAddr = cf.SsoAddr
cli.userInfoAddr = cf.UserInfoAddr
cli.TranTokenMethod = cf.TranTokenMethod
cli.callbackAddr = cf.RedirectURL
cli.displayName = cf.DisplayName
cli.coverAttributes = cf.CoverAttributes
cli.attributes.username = cf.Attributes.Username
cli.attributes.nickname = cf.Attributes.Nickname
cli.attributes.phone = cf.Attributes.Phone
cli.attributes.email = cf.Attributes.Email
cli.userinfoIsArray = cf.UserinfoIsArray
cli.userinfoPrefix = cf.UserinfoPrefix
cli.config = oauth2.Config{
ClientID: cf.ClientId,
ClientSecret: cf.ClientSecret,
Endpoint: oauth2.Endpoint{
AuthURL: cf.SsoAddr,
TokenURL: cf.TokenAddr,
},
RedirectURL: cf.RedirectURL,
Scopes: cf.Scopes,
}
}
func GetDisplayName() string {
return cli.displayName
}
func wrapStateKey(key string) string {
return "n9e_oauth_" + key
}
// Authorize return the sso authorize location with state
func Authorize(redirect string) (string, error) {
state := uuid.New().String()
ctx := context.Background()
err := storage.Redis.Set(ctx, wrapStateKey(state), redirect, time.Duration(300*time.Second)).Err()
if err != nil {
return "", err
}
return cli.config.AuthCodeURL(state), nil
}
func fetchRedirect(ctx context.Context, state string) (string, error) {
return storage.Redis.Get(ctx, wrapStateKey(state)).Result()
}
func deleteRedirect(ctx context.Context, state string) error {
return storage.Redis.Del(ctx, wrapStateKey(state)).Err()
}
// Callback 用 code 兑换 accessToken 以及 用户信息
func Callback(ctx context.Context, code, state string) (*CallbackOutput, error) {
ret, err := exchangeUser(code)
if err != nil {
return nil, fmt.Errorf("ilegal user:%v", err)
}
ret.Redirect, err = fetchRedirect(ctx, state)
if err != nil {
logger.Errorf("get redirect err:%v code:%s state:%s", code, state, err)
}
err = deleteRedirect(ctx, state)
if err != nil {
logger.Errorf("delete redirect err:%v code:%s state:%s", code, state, err)
}
return ret, nil
}
type CallbackOutput struct {
Redirect string `json:"redirect"`
Msg string `json:"msg"`
AccessToken string `json:"accessToken"`
Username string `json:"username"`
Nickname string `json:"nickname"`
Phone string `yaml:"phone"`
Email string `yaml:"email"`
}
func exchangeUser(code string) (*CallbackOutput, error) {
ctx := context.Background()
oauth2Token, err := cli.config.Exchange(ctx, code)
if err != nil {
return nil, fmt.Errorf("failed to exchange token: %s", err)
}
userInfo, err := getUserInfo(cli.userInfoAddr, oauth2Token.AccessToken, cli.TranTokenMethod)
if err != nil {
logger.Errorf("failed to get user info: %s", err)
return nil, fmt.Errorf("failed to get user info: %s", err)
}
return &CallbackOutput{
AccessToken: oauth2Token.AccessToken,
Username: getUserinfoField(userInfo, cli.userinfoIsArray, cli.userinfoPrefix, cli.attributes.username),
Nickname: getUserinfoField(userInfo, cli.userinfoIsArray, cli.userinfoPrefix, cli.attributes.nickname),
Phone: getUserinfoField(userInfo, cli.userinfoIsArray, cli.userinfoPrefix, cli.attributes.phone),
Email: getUserinfoField(userInfo, cli.userinfoIsArray, cli.userinfoPrefix, cli.attributes.email),
}, nil
}
func getUserInfo(userInfoAddr, accessToken string, TranTokenMethod string) ([]byte, error) {
var req *http.Request
if TranTokenMethod == "formdata" {
body := bytes.NewBuffer([]byte("access_token=" + accessToken))
r, err := http.NewRequest("POST", userInfoAddr, body)
if err != nil {
return nil, err
}
r.Header.Add("Content-Type", "application/x-www-form-urlencoded")
req = r
} else if TranTokenMethod == "querystring" {
r, err := http.NewRequest("GET", userInfoAddr+"?access_token="+accessToken, nil)
if err != nil {
return nil, err
}
r.Header.Add("Authorization", "Bearer "+accessToken)
req = r
} else {
r, err := http.NewRequest("GET", userInfoAddr, nil)
if err != nil {
return nil, err
}
r.Header.Add("Authorization", "Bearer "+accessToken)
req = r
}
resp, err := http.DefaultClient.Do(req)
if err != nil {
return nil, err
}
body, err := ioutil.ReadAll(resp.Body)
resp.Body.Close()
if err != nil {
return nil, nil
}
return body, err
}
func getUserinfoField(input []byte, isArray bool, prefix, field string) string {
if prefix == "" {
if isArray {
return jsoniter.Get(input, 0).Get(field).ToString()
} else {
return jsoniter.Get(input, field).ToString()
}
} else {
if isArray {
return jsoniter.Get(input, prefix, 0).Get(field).ToString()
} else {
return jsoniter.Get(input, prefix).Get(field).ToString()
}
}
}

View File

@@ -20,7 +20,6 @@ type ssoClient struct {
ssoAddr string
callbackAddr string
coverAttributes bool
displayName string
attributes struct {
username string
nickname string
@@ -31,7 +30,6 @@ type ssoClient struct {
type Config struct {
Enable bool
DisplayName string
RedirectURL string
SsoAddr string
ClientId string
@@ -61,7 +59,6 @@ func Init(cf Config) {
cli.attributes.nickname = cf.Attributes.Nickname
cli.attributes.phone = cf.Attributes.Phone
cli.attributes.email = cf.Attributes.Email
cli.displayName = cf.DisplayName
provider, err := oidc.NewProvider(context.Background(), cf.SsoAddr)
if err != nil {
log.Fatal(err)
@@ -80,10 +77,6 @@ func Init(cf Config) {
}
}
func GetDisplayName() string {
return cli.displayName
}
func wrapStateKey(key string) string {
return "n9e_oidc_" + key
}

View File

@@ -6,11 +6,9 @@ import (
"io/ioutil"
"net/http"
"time"
"github.com/toolkits/pkg/logger"
)
func PostJSON(url string, timeout time.Duration, v interface{}, retries ...int) (response []byte, code int, err error) {
func PostJSON(url string, timeout time.Duration, v interface{}) (response []byte, code int, err error) {
var bs []byte
bs, err = json.Marshal(v)
@@ -28,29 +26,7 @@ func PostJSON(url string, timeout time.Duration, v interface{}, retries ...int)
req.Header.Set("Content-Type", "application/json")
var resp *http.Response
if len(retries) > 0 {
for i := 0; i < retries[0]; i++ {
resp, err = client.Do(req)
if err == nil {
break
}
tryagain := ""
if i+1 < retries[0] {
tryagain = " try again"
}
logger.Warningf("failed to curl %s error: %s"+tryagain, url, err)
if i+1 < retries[0] {
time.Sleep(time.Millisecond * 200)
}
}
} else {
resp, err = client.Do(req)
}
resp, err = client.Do(req)
if err != nil {
return
}

View File

@@ -1,100 +0,0 @@
package secu
import (
"bytes"
"crypto/aes"
"crypto/cipher"
"encoding/base64"
"strings"
)
// BASE64StdEncode base64编码
func BASE64StdEncode(src []byte) string {
return base64.StdEncoding.EncodeToString(src)
}
// BASE64StdDecode base64解码
func BASE64StdDecode(src string) ([]byte, error) {
dst, err := base64.StdEncoding.DecodeString(src)
if err != nil {
return nil, err
}
return dst, nil
}
func PKCS7Padding(ciphertext []byte, blockSize int) []byte {
padding := blockSize - len(ciphertext)%blockSize
padtext := bytes.Repeat([]byte{byte(padding)}, padding)
return append(ciphertext, padtext...)
}
func PKCS7UnPadding(originData []byte) []byte {
length := len(originData)
unpadding := int(originData[length-1])
return originData[:(length - unpadding)]
}
//AES加密
func AesEncrypt(origData, key []byte) ([]byte, error) {
block, err := aes.NewCipher(key)
if err != nil {
return nil, err
}
//加密块填充
blockSize := block.BlockSize()
padOrigData := PKCS7Padding(origData, blockSize)
//初始化CBC加密
blockMode := cipher.NewCBCEncrypter(block, key[:blockSize])
crypted := make([]byte, len(padOrigData))
//加密
blockMode.CryptBlocks(crypted, padOrigData)
return crypted, nil
}
//AES解密
func AesDecrypt(crypted, key []byte) ([]byte, error) {
block, err := aes.NewCipher(key)
if err != nil {
return nil, err
}
blockSize := block.BlockSize()
blockMode := cipher.NewCBCDecrypter(block, key[:blockSize])
origData := make([]byte, len(crypted))
//解密
blockMode.CryptBlocks(origData, crypted)
//去除填充
origData = PKCS7UnPadding(origData)
return origData, nil
}
// 针对配置文件属性进行解密处理
func DealWithDecrypt(src string, key string) (string, error) {
//如果是{{cipher}}前缀,则代表是加密过的属性,先解密
if strings.HasPrefix(src, "{{cipher}}") {
data := src[10:]
decodeData, err := BASE64StdDecode(data)
if err != nil {
return src, err
}
//解密
origin, err := AesDecrypt(decodeData, []byte(key))
if err != nil {
return src, err
}
//返回明文
return string(origin), nil
} else {
return src, nil
}
}
// 针对配置文件属性进行加密处理
func DealWithEncrypt(src string, key string) (string, error) {
encrypted, err := AesEncrypt([]byte(src), []byte(key))
if err != nil {
return src, err
}
data := BASE64StdEncode(encrypted)
return "{{cipher}}" + data, nil
}

View File

@@ -12,26 +12,25 @@ import (
// ClientConfig represents the standard client TLS config.
type ClientConfig struct {
TLSCA string `toml:"tls_ca"`
TLSCert string `toml:"tls_cert"`
TLSKey string `toml:"tls_key"`
TLSKeyPwd string `toml:"tls_key_pwd"`
InsecureSkipVerify bool `toml:"insecure_skip_verify"`
ServerName string `toml:"tls_server_name"`
TLSMinVersion string `toml:"tls_min_version"`
TLSMaxVersion string `toml:"tls_max_version"`
TLSCA string
TLSCert string
TLSKey string
TLSKeyPwd string
InsecureSkipVerify bool
ServerName string
TLSMinVersion string
}
// ServerConfig represents the standard server TLS config.
type ServerConfig struct {
TLSCert string `toml:"tls_cert"`
TLSKey string `toml:"tls_key"`
TLSKeyPwd string `toml:"tls_key_pwd"`
TLSAllowedCACerts []string `toml:"tls_allowed_cacerts"`
TLSCipherSuites []string `toml:"tls_cipher_suites"`
TLSMinVersion string `toml:"tls_min_version"`
TLSMaxVersion string `toml:"tls_max_version"`
TLSAllowedDNSNames []string `toml:"tls_allowed_dns_names"`
TLSCert string
TLSKey string
TLSKeyPwd string
TLSAllowedCACerts []string
TLSCipherSuites []string
TLSMinVersion string
TLSMaxVersion string
TLSAllowedDNSNames []string
}
// TLSConfig returns a tls.Config, may be nil without error if TLS is not
@@ -71,16 +70,6 @@ func (c *ClientConfig) TLSConfig() (*tls.Config, error) {
tlsConfig.MinVersion = tls.VersionTLS13
}
if c.TLSMaxVersion == "1.0" {
tlsConfig.MaxVersion = tls.VersionTLS10
} else if c.TLSMaxVersion == "1.1" {
tlsConfig.MaxVersion = tls.VersionTLS11
} else if c.TLSMaxVersion == "1.2" {
tlsConfig.MaxVersion = tls.VersionTLS12
} else if c.TLSMaxVersion == "1.3" {
tlsConfig.MaxVersion = tls.VersionTLS13
}
return tlsConfig, nil
}

View File

@@ -4,7 +4,6 @@ import (
"fmt"
"html/template"
"math"
"reflect"
"regexp"
"strconv"
"time"
@@ -34,10 +33,6 @@ func Timestamp(pattern ...string) string {
return time.Now().Format(defp)
}
func Now() time.Time {
return time.Now()
}
func Args(args ...interface{}) map[string]interface{} {
result := make(map[string]interface{})
for i, a := range args {
@@ -100,27 +95,11 @@ func Humanize1024(s string) string {
return fmt.Sprintf("%.4g%s", v, prefix)
}
func ToString(v interface{}) string {
return fmt.Sprint(v)
}
func HumanizeDuration(s string) string {
v, err := strconv.ParseFloat(s, 64)
if err != nil {
return s
}
return HumanizeDurationFloat64(v)
}
func HumanizeDurationInterface(i interface{}) string {
f, err := ToFloat64(i)
if err != nil {
return ToString(i)
}
return HumanizeDurationFloat64(f)
}
func HumanizeDurationFloat64(v float64) string {
if math.IsNaN(v) || math.IsInf(v, 0) {
return fmt.Sprintf("%.4g", v)
}
@@ -176,179 +155,3 @@ func HumanizePercentageH(s string) string {
}
return fmt.Sprintf("%.2f%%", v)
}
// Add returns the sum of a and b.
func Add(a, b interface{}) (interface{}, error) {
av := reflect.ValueOf(a)
bv := reflect.ValueOf(b)
switch av.Kind() {
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
switch bv.Kind() {
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
return av.Int() + bv.Int(), nil
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64:
return av.Int() + int64(bv.Uint()), nil
case reflect.Float32, reflect.Float64:
return float64(av.Int()) + bv.Float(), nil
default:
return nil, fmt.Errorf("add: unknown type for %q (%T)", bv, b)
}
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64:
switch bv.Kind() {
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
return int64(av.Uint()) + bv.Int(), nil
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64:
return av.Uint() + bv.Uint(), nil
case reflect.Float32, reflect.Float64:
return float64(av.Uint()) + bv.Float(), nil
default:
return nil, fmt.Errorf("add: unknown type for %q (%T)", bv, b)
}
case reflect.Float32, reflect.Float64:
switch bv.Kind() {
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
return av.Float() + float64(bv.Int()), nil
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64:
return av.Float() + float64(bv.Uint()), nil
case reflect.Float32, reflect.Float64:
return av.Float() + bv.Float(), nil
default:
return nil, fmt.Errorf("add: unknown type for %q (%T)", bv, b)
}
default:
return nil, fmt.Errorf("add: unknown type for %q (%T)", av, a)
}
}
// Subtract returns the difference of b from a.
func Subtract(a, b interface{}) (interface{}, error) {
av := reflect.ValueOf(a)
bv := reflect.ValueOf(b)
switch av.Kind() {
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
switch bv.Kind() {
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
return av.Int() - bv.Int(), nil
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64:
return av.Int() - int64(bv.Uint()), nil
case reflect.Float32, reflect.Float64:
return float64(av.Int()) - bv.Float(), nil
default:
return nil, fmt.Errorf("subtract: unknown type for %q (%T)", bv, b)
}
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64:
switch bv.Kind() {
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
return int64(av.Uint()) - bv.Int(), nil
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64:
return av.Uint() - bv.Uint(), nil
case reflect.Float32, reflect.Float64:
return float64(av.Uint()) - bv.Float(), nil
default:
return nil, fmt.Errorf("subtract: unknown type for %q (%T)", bv, b)
}
case reflect.Float32, reflect.Float64:
switch bv.Kind() {
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
return av.Float() - float64(bv.Int()), nil
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64:
return av.Float() - float64(bv.Uint()), nil
case reflect.Float32, reflect.Float64:
return av.Float() - bv.Float(), nil
default:
return nil, fmt.Errorf("subtract: unknown type for %q (%T)", bv, b)
}
default:
return nil, fmt.Errorf("subtract: unknown type for %q (%T)", av, a)
}
}
// Multiply returns the product of a and b.
func Multiply(a, b interface{}) (interface{}, error) {
av := reflect.ValueOf(a)
bv := reflect.ValueOf(b)
switch av.Kind() {
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
switch bv.Kind() {
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
return av.Int() * bv.Int(), nil
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64:
return av.Int() * int64(bv.Uint()), nil
case reflect.Float32, reflect.Float64:
return float64(av.Int()) * bv.Float(), nil
default:
return nil, fmt.Errorf("multiply: unknown type for %q (%T)", bv, b)
}
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64:
switch bv.Kind() {
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
return int64(av.Uint()) * bv.Int(), nil
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64:
return av.Uint() * bv.Uint(), nil
case reflect.Float32, reflect.Float64:
return float64(av.Uint()) * bv.Float(), nil
default:
return nil, fmt.Errorf("multiply: unknown type for %q (%T)", bv, b)
}
case reflect.Float32, reflect.Float64:
switch bv.Kind() {
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
return av.Float() * float64(bv.Int()), nil
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64:
return av.Float() * float64(bv.Uint()), nil
case reflect.Float32, reflect.Float64:
return av.Float() * bv.Float(), nil
default:
return nil, fmt.Errorf("multiply: unknown type for %q (%T)", bv, b)
}
default:
return nil, fmt.Errorf("multiply: unknown type for %q (%T)", av, a)
}
}
// Divide returns the division of b from a.
func Divide(a, b interface{}) (interface{}, error) {
av := reflect.ValueOf(a)
bv := reflect.ValueOf(b)
switch av.Kind() {
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
switch bv.Kind() {
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
return av.Int() / bv.Int(), nil
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64:
return av.Int() / int64(bv.Uint()), nil
case reflect.Float32, reflect.Float64:
return float64(av.Int()) / bv.Float(), nil
default:
return nil, fmt.Errorf("divide: unknown type for %q (%T)", bv, b)
}
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64:
switch bv.Kind() {
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
return int64(av.Uint()) / bv.Int(), nil
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64:
return av.Uint() / bv.Uint(), nil
case reflect.Float32, reflect.Float64:
return float64(av.Uint()) / bv.Float(), nil
default:
return nil, fmt.Errorf("divide: unknown type for %q (%T)", bv, b)
}
case reflect.Float32, reflect.Float64:
switch bv.Kind() {
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
return av.Float() / float64(bv.Int()), nil
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64:
return av.Float() / float64(bv.Uint()), nil
case reflect.Float32, reflect.Float64:
return av.Float() / bv.Float(), nil
default:
return nil, fmt.Errorf("divide: unknown type for %q (%T)", bv, b)
}
default:
return nil, fmt.Errorf("divide: unknown type for %q (%T)", av, a)
}
}

View File

@@ -1,73 +0,0 @@
package tplx
import (
"fmt"
"strconv"
)
// ToFloat64 convert interface to float64
func ToFloat64(val interface{}) (float64, error) {
switch v := val.(type) {
case string:
if f, err := strconv.ParseFloat(v, 64); err == nil {
return f, nil
}
// try int
if i, err := strconv.ParseInt(v, 0, 64); err == nil {
return float64(i), nil
}
// try bool
b, err := strconv.ParseBool(v)
if err == nil {
if b {
return 1, nil
} else {
return 0, nil
}
}
if v == "Yes" || v == "yes" || v == "YES" || v == "Y" || v == "ON" || v == "on" || v == "On" || v == "ok" || v == "up" {
return 1, nil
}
if v == "No" || v == "no" || v == "NO" || v == "N" || v == "OFF" || v == "off" || v == "Off" || v == "fail" || v == "err" || v == "down" {
return 0, nil
}
return 0, fmt.Errorf("unparseable value %v", v)
case float64:
return v, nil
case uint64:
return float64(v), nil
case uint32:
return float64(v), nil
case uint16:
return float64(v), nil
case uint8:
return float64(v), nil
case uint:
return float64(v), nil
case int64:
return float64(v), nil
case int32:
return float64(v), nil
case int16:
return float64(v), nil
case int8:
return float64(v), nil
case bool:
if v {
return 1, nil
} else {
return 0, nil
}
case int:
return float64(v), nil
case float32:
return float64(v), nil
default:
return strconv.ParseFloat(fmt.Sprint(v), 64)
}
}

View File

@@ -2,33 +2,24 @@ package tplx
import (
"html/template"
"net/url"
"regexp"
"strings"
)
var TemplateFuncMap = template.FuncMap{
"escape": url.PathEscape,
"unescaped": Unescaped,
"urlconvert": Urlconvert,
"timeformat": Timeformat,
"timestamp": Timestamp,
"args": Args,
"reReplaceAll": ReReplaceAll,
"match": regexp.MatchString,
"toUpper": strings.ToUpper,
"toLower": strings.ToLower,
"contains": strings.Contains,
"humanize": Humanize,
"humanize1024": Humanize1024,
"humanizeDuration": HumanizeDuration,
"humanizeDurationInterface": HumanizeDurationInterface,
"humanizePercentage": HumanizePercentage,
"humanizePercentageH": HumanizePercentageH,
"add": Add,
"sub": Subtract,
"mul": Multiply,
"div": Divide,
"now": Now,
"toString": ToString,
"unescaped": Unescaped,
"urlconvert": Urlconvert,
"timeformat": Timeformat,
"timestamp": Timestamp,
"args": Args,
"reReplaceAll": ReReplaceAll,
"match": regexp.MatchString,
"toUpper": strings.ToUpper,
"toLower": strings.ToLower,
"contains": strings.Contains,
"humanize": Humanize,
"humanize1024": Humanize1024,
"humanizeDuration": HumanizeDuration,
"humanizePercentage": HumanizePercentage,
"humanizePercentageH": HumanizePercentageH,
}

View File

@@ -1,9 +1,7 @@
package conv
import (
"fmt"
"math"
"strings"
"github.com/prometheus/common/model"
)
@@ -15,12 +13,6 @@ type Vector struct {
Value float64 `json:"value"`
}
func (v *Vector) ReadableValue() string {
ret := fmt.Sprintf("%.5f", v.Value)
ret = strings.TrimRight(ret, "0")
return strings.TrimRight(ret, ".")
}
func ConvertVectors(value model.Value) (lst []Vector) {
if value == nil {
return

View File

@@ -96,9 +96,6 @@ func labelsToLabelsProto(labels model.Metric, rule *models.RecordingRule) (resul
}
result = append(result, nameLs)
for k, v := range labels {
if k == LabelName {
continue
}
if model.LabelNameRE.MatchString(string(k)) {
result = append(result, &prompb.Label{
Name: string(k),

View File

@@ -12,17 +12,13 @@ func AppendLabels(pt *prompb.TimeSeries, target *models.Target) {
return
}
labelKeys := make(map[string]int)
labelKeys := make(map[string]struct{})
for j := 0; j < len(pt.Labels); j++ {
labelKeys[pt.Labels[j].Name] = j
labelKeys[pt.Labels[j].Name] = struct{}{}
}
for key, value := range target.TagsMap {
if index, has := labelKeys[key]; has {
// overwrite labels
if config.C.LabelRewrite {
pt.Labels[index].Value = value
}
if _, has := labelKeys[key]; has {
continue
}
@@ -36,7 +32,7 @@ func AppendLabels(pt *prompb.TimeSeries, target *models.Target) {
if _, has := labelKeys[config.C.BusiGroupLabelKey]; has {
return
}
// 将业务组名称作为tag附加到数据上
if target.GroupId > 0 && len(config.C.BusiGroupLabelKey) > 0 {
bg := memsto.BusiGroupCache.GetByBusiGroupId(target.GroupId)
if bg == nil {

View File

@@ -50,9 +50,6 @@ func SendDingtalk(message DingtalkMessage) {
}
ur := "https://oapi.dingtalk.com/robot/send?access_token=" + u.Path
if strings.HasPrefix(message.Tokens[i], "https://") {
ur = message.Tokens[i]
}
body := dingtalk{
Msgtype: "markdown",
Markdown: dingtalkMarkdown{
@@ -69,7 +66,7 @@ func SendDingtalk(message DingtalkMessage) {
}
}
res, code, err := poster.PostJSON(ur, time.Second*5, body, 3)
res, code, err := poster.PostJSON(ur, time.Second*5, body)
if err != nil {
logger.Errorf("dingtalk_sender: result=fail url=%s code=%d error=%v response=%s", ur, code, err, string(res))
} else {

View File

@@ -1,7 +1,6 @@
package sender
import (
"strings"
"time"
"github.com/didi/nightingale/v5/src/pkg/poster"
@@ -32,9 +31,6 @@ type feishu struct {
func SendFeishu(message FeishuMessage) {
for i := 0; i < len(message.Tokens); i++ {
url := "https://open.feishu.cn/open-apis/bot/v2/hook/" + message.Tokens[i]
if strings.HasPrefix(message.Tokens[i], "https://") {
url = message.Tokens[i]
}
body := feishu{
Msgtype: "text",
Content: feishuContent{
@@ -46,7 +42,7 @@ func SendFeishu(message FeishuMessage) {
},
}
res, code, err := poster.PostJSON(url, time.Second*5, body, 3)
res, code, err := poster.PostJSON(url, time.Second*5, body)
if err != nil {
logger.Errorf("feishu_sender: result=fail url=%s code=%d error=%v response=%s", url, code, err, string(res))
} else {

View File

@@ -1,73 +0,0 @@
package sender
import (
"net/url"
"strings"
"time"
"github.com/didi/nightingale/v5/src/pkg/poster"
"github.com/toolkits/pkg/logger"
)
type MatterMostMessage struct {
Text string
Tokens []string
}
type mm struct {
Channel string `json:"channel"`
Username string `json:"username"`
Text string `json:"text"`
}
func MapStrToStr(arr []string, fn func(s string) string) []string {
var newArray = []string{}
for _, it := range arr {
newArray = append(newArray, fn(it))
}
return newArray
}
func SendMM(message MatterMostMessage) {
for i := 0; i < len(message.Tokens); i++ {
u, err := url.Parse(message.Tokens[i])
if err != nil {
logger.Errorf("mm_sender: failed to parse error=%v", err)
}
v, err := url.ParseQuery(u.RawQuery)
if err != nil {
logger.Errorf("mm_sender: failed to parse query error=%v", err)
}
channels := v["channel"] // do not get
txt := ""
atuser := v["atuser"]
if len(atuser) != 0 {
txt = strings.Join(MapStrToStr(atuser, func(u string) string {
return "@" + u
}), ",") + "\n"
}
username := v.Get("username")
if err != nil {
logger.Errorf("mm_sender: failed to parse error=%v", err)
}
// simple concatenating
ur := u.Scheme + "://" + u.Host + u.Path
for _, channel := range channels {
body := mm{
Channel: channel,
Username: username,
Text: txt + message.Text,
}
res, code, err := poster.PostJSON(ur, time.Second*5, body, 3)
if err != nil {
logger.Errorf("mm_sender: result=fail url=%s code=%d error=%v response=%s", ur, code, err, string(res))
} else {
logger.Infof("mm_sender: result=succ url=%s code=%d response=%s", ur, code, string(res))
}
}
}
}

View File

@@ -1,52 +0,0 @@
package sender
import (
"strings"
"time"
"github.com/didi/nightingale/v5/src/pkg/poster"
"github.com/toolkits/pkg/logger"
)
type TelegramMessage struct {
Text string
Tokens []string
}
type telegram struct {
ParseMode string `json:"parse_mode"`
Text string `json:"text"`
}
func SendTelegram(message TelegramMessage) {
for i := 0; i < len(message.Tokens); i++ {
if !strings.Contains(message.Tokens[i], "/") && !strings.HasPrefix(message.Tokens[i], "https://") {
logger.Errorf("telegram_sender: result=fail invalid token=%s", message.Tokens[i])
continue
}
var url string
if strings.HasPrefix(message.Tokens[i], "https://") {
url = message.Tokens[i]
} else {
array := strings.Split(message.Tokens[i], "/")
if len(array) != 2 {
logger.Errorf("telegram_sender: result=fail invalid token=%s", message.Tokens[i])
continue
}
botToken := array[0]
chatId := array[1]
url = "https://api.telegram.org/bot" + botToken + "/sendMessage?chat_id=" + chatId
}
body := telegram{
ParseMode: "markdown",
Text: message.Text,
}
res, code, err := poster.PostJSON(url, time.Second*5, body, 3)
if err != nil {
logger.Errorf("telegram_sender: result=fail url=%s code=%d error=%v response=%s", url, code, err, string(res))
} else {
logger.Infof("telegram_sender: result=succ url=%s code=%d response=%s", url, code, string(res))
}
}
}

View File

@@ -1,7 +1,6 @@
package sender
import (
"strings"
"time"
"github.com/didi/nightingale/v5/src/pkg/poster"
@@ -25,9 +24,6 @@ type wecom struct {
func SendWecom(message WecomMessage) {
for i := 0; i < len(message.Tokens); i++ {
url := "https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=" + message.Tokens[i]
if strings.HasPrefix(message.Tokens[i], "https://") {
url = message.Tokens[i]
}
body := wecom{
Msgtype: "markdown",
Markdown: wecomMarkdown{
@@ -35,7 +31,7 @@ func SendWecom(message WecomMessage) {
},
}
res, code, err := poster.PostJSON(url, time.Second*5, body, 3)
res, code, err := poster.PostJSON(url, time.Second*5, body)
if err != nil {
logger.Errorf("wecom_sender: result=fail url=%s code=%d error=%v response=%s", url, code, err, string(res))
} else {

View File

@@ -14,12 +14,10 @@ import (
"github.com/gin-gonic/gin"
"github.com/koding/multiconfig"
"github.com/didi/nightingale/v5/src/models"
"github.com/didi/nightingale/v5/src/notifier"
"github.com/didi/nightingale/v5/src/pkg/httpx"
"github.com/didi/nightingale/v5/src/pkg/logx"
"github.com/didi/nightingale/v5/src/pkg/ormx"
"github.com/didi/nightingale/v5/src/pkg/secu"
"github.com/didi/nightingale/v5/src/storage"
)
@@ -28,68 +26,7 @@ var (
once sync.Once
)
func DealConfigCrypto(key string) {
decryptDsn, err := secu.DealWithDecrypt(C.DB.DSN, key)
if err != nil {
fmt.Println("failed to decrypt the db dsn", err)
os.Exit(1)
}
C.DB.DSN = decryptDsn
decryptRedisPwd, err := secu.DealWithDecrypt(C.Redis.Password, key)
if err != nil {
fmt.Println("failed to decrypt the redis password", err)
os.Exit(1)
}
C.Redis.Password = decryptRedisPwd
decryptSmtpPwd, err := secu.DealWithDecrypt(C.SMTP.Pass, key)
if err != nil {
fmt.Println("failed to decrypt the smtp password", err)
os.Exit(1)
}
C.SMTP.Pass = decryptSmtpPwd
decryptHookPwd, err := secu.DealWithDecrypt(C.Alerting.Webhook.BasicAuthPass, key)
if err != nil {
fmt.Println("failed to decrypt the alert webhook password", err)
os.Exit(1)
}
C.Alerting.Webhook.BasicAuthPass = decryptHookPwd
decryptIbexPwd, err := secu.DealWithDecrypt(C.Ibex.BasicAuthPass, key)
if err != nil {
fmt.Println("failed to decrypt the ibex password", err)
os.Exit(1)
}
C.Ibex.BasicAuthPass = decryptIbexPwd
if len(C.Readers) == 0 {
C.Reader.ClusterName = C.ClusterName
C.Readers = append(C.Readers, C.Reader)
}
for index, v := range C.Readers {
decryptReaderPwd, err := secu.DealWithDecrypt(v.BasicAuthPass, key)
if err != nil {
fmt.Printf("failed to decrypt the reader password: %s , error: %s", v.BasicAuthPass, err.Error())
os.Exit(1)
}
C.Readers[index].BasicAuthPass = decryptReaderPwd
}
for index, v := range C.Writers {
decryptWriterPwd, err := secu.DealWithDecrypt(v.BasicAuthPass, key)
if err != nil {
fmt.Printf("failed to decrypt the writer password: %s , error: %s", v.BasicAuthPass, err.Error())
os.Exit(1)
}
C.Writers[index].BasicAuthPass = decryptWriterPwd
}
}
func MustLoad(key string, fpaths ...string) {
func MustLoad(fpaths ...string) {
once.Do(func() {
loaders := []multiconfig.Loader{
&multiconfig.TagLoader{},
@@ -128,21 +65,10 @@ func MustLoad(key string, fpaths ...string) {
}
m.MustLoad(C)
DealConfigCrypto(key)
if C.EngineDelay == 0 {
C.EngineDelay = 120
}
if C.ReaderFrom == "" {
C.ReaderFrom = "config"
}
if C.ReaderFrom == "config" && C.ClusterName == "" {
fmt.Println("configuration ClusterName is blank")
os.Exit(1)
}
if C.Heartbeat.IP == "" {
// auto detect
// C.Heartbeat.IP = fmt.Sprint(GetOutboundIP())
@@ -154,11 +80,7 @@ func MustLoad(key string, fpaths ...string) {
os.Exit(1)
}
if strings.Contains(hostname, "localhost") {
fmt.Println("Warning! hostname contains substring localhost, setting a more unique hostname is recommended")
}
C.Heartbeat.IP = hostname
C.Heartbeat.IP = hostname + "+" + fmt.Sprint(os.Getpid())
// if C.Heartbeat.IP == "" {
// fmt.Println("heartbeat ip auto got is blank")
@@ -167,6 +89,7 @@ func MustLoad(key string, fpaths ...string) {
}
C.Heartbeat.Endpoint = fmt.Sprintf("%s:%d", C.Heartbeat.IP, C.HTTP.Port)
C.Alerting.RedisPub.ChannelKey = C.Alerting.RedisPub.ChannelPrefix + C.ClusterName
if C.Alerting.Webhook.Enable {
if C.Alerting.Webhook.Timeout == "" {
@@ -209,7 +132,7 @@ func MustLoad(key string, fpaths ...string) {
}
if C.WriterOpt.QueueMaxSize <= 0 {
C.WriterOpt.QueueMaxSize = 10000000
C.WriterOpt.QueueMaxSize = 100000
}
if C.WriterOpt.QueuePopSize <= 0 {
@@ -217,42 +140,7 @@ func MustLoad(key string, fpaths ...string) {
}
if C.WriterOpt.QueueCount <= 0 {
C.WriterOpt.QueueCount = 1000
}
if C.WriterOpt.ShardingKey == "" {
C.WriterOpt.ShardingKey = "ident"
}
for i, write := range C.Writers {
if C.Writers[i].ClusterName == "" {
C.Writers[i].ClusterName = C.ClusterName
}
for _, relabel := range write.WriteRelabels {
regex, ok := relabel.Regex.(string)
if !ok {
log.Println("Regex field must be a string")
os.Exit(1)
}
if regex == "" {
regex = "(.*)"
}
relabel.Regex = models.MustNewRegexp(regex)
if relabel.Separator == "" {
relabel.Separator = ";"
}
if relabel.Action == "" {
relabel.Action = "replace"
}
if relabel.Replacement == "" {
relabel.Replacement = "$1"
}
}
C.WriterOpt.QueueCount = 100
}
fmt.Println("heartbeat.ip:", C.Heartbeat.IP)
@@ -262,13 +150,11 @@ func MustLoad(key string, fpaths ...string) {
type Config struct {
RunMode string
ClusterName string // 监控对象上报时,指定的集群名称
ClusterName string
BusiGroupLabelKey string
AnomalyDataApi []string
EngineDelay int64
DisableUsageReport bool
ReaderFrom string
LabelRewrite bool
ForceUseServerTS bool
Log logx.Config
HTTP httpx.Config
BasicAuth gin.Accounts
@@ -280,13 +166,11 @@ type Config struct {
DB ormx.DBConfig
WriterOpt WriterGlobalOpt
Writers []WriterOptions
Reader PromOption
Readers []PromOption
Reader ReaderOptions
Ibex Ibex
}
type WriterOptions struct {
ClusterName string
type ReaderOptions struct {
Url string
BasicAuthUser string
BasicAuthPass string
@@ -303,15 +187,31 @@ type WriterOptions struct {
MaxIdleConnsPerHost int
Headers []string
}
WriteRelabels []*models.RelabelConfig
type WriterOptions struct {
Url string
BasicAuthUser string
BasicAuthPass string
Timeout int64
DialTimeout int64
TLSHandshakeTimeout int64
ExpectContinueTimeout int64
IdleConnTimeout int64
KeepAlive int64
MaxConnsPerHost int
MaxIdleConns int
MaxIdleConnsPerHost int
Headers []string
}
type WriterGlobalOpt struct {
QueueCount int
QueueMaxSize int
QueuePopSize int
ShardingKey string
}
type HeartbeatConfig struct {
@@ -331,7 +231,6 @@ type SMTPConfig struct {
}
type Alerting struct {
Timeout int64
TemplatesDir string
NotifyConcurrency int
NotifyBuiltinChannels []string
@@ -386,7 +285,7 @@ func (c *Config) IsDebugMode() bool {
// Get preferred outbound ip of this machine
func GetOutboundIP() net.IP {
conn, err := net.Dial("udp", "223.5.5.5:80")
conn, err := net.Dial("udp", "8.8.8.8:80")
if err != nil {
fmt.Println("auto get outbound ip fail:", err)
os.Exit(1)

View File

@@ -1,92 +0,0 @@
package config
import (
"strings"
"sync"
"github.com/didi/nightingale/v5/src/models"
"github.com/didi/nightingale/v5/src/pkg/prom"
)
type PromClientMap struct {
sync.RWMutex
Clients map[string]prom.API
}
var ReaderClients = &PromClientMap{Clients: make(map[string]prom.API)}
func (pc *PromClientMap) Set(clusterName string, c prom.API) {
if c == nil {
return
}
pc.Lock()
defer pc.Unlock()
pc.Clients[clusterName] = c
}
func (pc *PromClientMap) GetClusterNames() []string {
pc.RLock()
defer pc.RUnlock()
var clusterNames []string
for k := range pc.Clients {
clusterNames = append(clusterNames, k)
}
return clusterNames
}
func (pc *PromClientMap) GetCli(cluster string) prom.API {
pc.RLock()
defer pc.RUnlock()
c := pc.Clients[cluster]
return c
}
func (pc *PromClientMap) IsNil(cluster string) bool {
pc.RLock()
defer pc.RUnlock()
c, exists := pc.Clients[cluster]
if !exists {
return true
}
return c == nil
}
// Hit 根据当前有效的cluster和规则的cluster配置计算有效的cluster列表
func (pc *PromClientMap) Hit(cluster string) []string {
pc.RLock()
defer pc.RUnlock()
clusters := make([]string, 0, len(pc.Clients))
if cluster == models.ClusterAll {
for c := range pc.Clients {
clusters = append(clusters, c)
}
return clusters
}
ruleClusters := strings.Fields(cluster)
for c := range pc.Clients {
for _, rc := range ruleClusters {
if rc == c {
clusters = append(clusters, c)
continue
}
}
}
return clusters
}
func (pc *PromClientMap) Reset() {
pc.Lock()
defer pc.Unlock()
pc.Clients = make(map[string]prom.API)
}
func (pc *PromClientMap) Del(cluster string) {
pc.Lock()
defer pc.Unlock()
delete(pc.Clients, cluster)
}

View File

@@ -1,89 +0,0 @@
package config
import (
"sync"
"github.com/didi/nightingale/v5/src/pkg/tls"
)
type PromOption struct {
ClusterName string
Url string
BasicAuthUser string
BasicAuthPass string
Timeout int64
DialTimeout int64
UseTLS bool
tls.ClientConfig
MaxIdleConnsPerHost int
Headers []string
}
func (po *PromOption) Equal(target PromOption) bool {
if po.Url != target.Url {
return false
}
if po.BasicAuthUser != target.BasicAuthUser {
return false
}
if po.BasicAuthPass != target.BasicAuthPass {
return false
}
if po.Timeout != target.Timeout {
return false
}
if po.DialTimeout != target.DialTimeout {
return false
}
if po.MaxIdleConnsPerHost != target.MaxIdleConnsPerHost {
return false
}
if len(po.Headers) != len(target.Headers) {
return false
}
for i := 0; i < len(po.Headers); i++ {
if po.Headers[i] != target.Headers[i] {
return false
}
}
return true
}
type PromOptionsStruct struct {
Data map[string]PromOption
sync.RWMutex
}
func (pos *PromOptionsStruct) Set(clusterName string, po PromOption) {
pos.Lock()
pos.Data[clusterName] = po
pos.Unlock()
}
func (pos *PromOptionsStruct) Del(clusterName string) {
pos.Lock()
delete(pos.Data, clusterName)
pos.Unlock()
}
func (pos *PromOptionsStruct) Get(clusterName string) (PromOption, bool) {
pos.RLock()
defer pos.RUnlock()
ret, has := pos.Data[clusterName]
return ret, has
}
// Data key is cluster name
var PromOptions = &PromOptionsStruct{Data: make(map[string]PromOption)}

View File

@@ -1,174 +0,0 @@
package config
import (
"encoding/json"
"fmt"
"net"
"net/http"
"strings"
"time"
"github.com/didi/nightingale/v5/src/models"
"github.com/didi/nightingale/v5/src/pkg/prom"
"github.com/prometheus/client_golang/api"
"github.com/toolkits/pkg/logger"
)
func InitReader() error {
rf := strings.ToLower(strings.TrimSpace(C.ReaderFrom))
if rf == "" || rf == "config" {
if len(C.Readers) == 0 {
C.Reader.ClusterName = C.ClusterName
C.Readers = append(C.Readers, C.Reader)
}
for _, reader := range C.Readers {
err := setClientFromPromOption(reader.ClusterName, reader)
if err != nil {
logger.Errorf("failed to setClientFromPromOption: %v", err)
continue
}
}
return nil
}
if rf == "database" {
return initFromDatabase()
}
return fmt.Errorf("invalid configuration ReaderFrom: %s", rf)
}
func initFromDatabase() error {
go func() {
for {
loadFromDatabase()
time.Sleep(time.Second)
}
}()
return nil
}
func loadFromDatabase() {
clusters, err := models.AlertingEngineGetClusters(C.Heartbeat.Endpoint)
if err != nil {
logger.Errorf("failed to get current cluster, error: %v", err)
return
}
if len(clusters) == 0 {
ReaderClients.Reset()
logger.Warning("no datasource binded to me")
return
}
newCluster := make(map[string]struct{})
for _, cluster := range clusters {
newCluster[cluster] = struct{}{}
ckey := "prom." + cluster + ".option"
cval, err := models.ConfigsGet(ckey)
if err != nil {
logger.Errorf("failed to get ckey: %s, error: %v", ckey, err)
continue
}
if cval == "" {
logger.Debugf("ckey: %s is empty", ckey)
continue
}
var po PromOption
err = json.Unmarshal([]byte(cval), &po)
if err != nil {
logger.Errorf("failed to unmarshal PromOption: %s", err)
continue
}
if ReaderClients.IsNil(cluster) {
// first time
if err = setClientFromPromOption(cluster, po); err != nil {
logger.Errorf("failed to setClientFromPromOption: %v", err)
continue
}
logger.Info("setClientFromPromOption success: ", cluster)
PromOptions.Set(cluster, po)
continue
}
localPo, has := PromOptions.Get(cluster)
if !has || !localPo.Equal(po) {
if err = setClientFromPromOption(cluster, po); err != nil {
logger.Errorf("failed to setClientFromPromOption: %v", err)
continue
}
PromOptions.Set(cluster, po)
}
}
// delete useless cluster
oldClusters := ReaderClients.GetClusterNames()
for _, oldCluster := range oldClusters {
if _, has := newCluster[oldCluster]; !has {
ReaderClients.Del(oldCluster)
PromOptions.Del(oldCluster)
logger.Info("delete cluster: ", oldCluster)
}
}
}
func newClientFromPromOption(po PromOption) (api.Client, error) {
transport := &http.Transport{
// TLSClientConfig: tlsConfig,
Proxy: http.ProxyFromEnvironment,
DialContext: (&net.Dialer{
Timeout: time.Duration(po.DialTimeout) * time.Millisecond,
}).DialContext,
ResponseHeaderTimeout: time.Duration(po.Timeout) * time.Millisecond,
MaxIdleConnsPerHost: po.MaxIdleConnsPerHost,
}
if po.UseTLS {
tlsConfig, err := po.TLSConfig()
if err != nil {
logger.Errorf("new cluster %s fail: %v", po.Url, err)
return nil, err
}
transport.TLSClientConfig = tlsConfig
}
return api.NewClient(api.Config{
Address: po.Url,
RoundTripper: transport,
})
}
func setClientFromPromOption(clusterName string, po PromOption) error {
if clusterName == "" {
return fmt.Errorf("argument clusterName is blank")
}
if po.Url == "" {
return fmt.Errorf("prometheus url is blank")
}
if strings.HasPrefix(po.Url, "https") {
po.UseTLS = true
po.InsecureSkipVerify = true
}
cli, err := newClientFromPromOption(po)
if err != nil {
return fmt.Errorf("failed to newClientFromPromOption: %v", err)
}
logger.Debugf("setClientFromPromOption: %s, %+v", clusterName, po)
ReaderClients.Set(clusterName, prom.NewAPI(cli, prom.ClientOptions{
BasicAuthUser: po.BasicAuthUser,
BasicAuthPass: po.BasicAuthPass,
Headers: po.Headers,
}))
return nil
}

View File

@@ -32,7 +32,7 @@ func callback(event *models.AlertCurEvent) {
url = "http://" + url
}
resp, code, err := poster.PostJSON(url, 5*time.Second, event, 3)
resp, code, err := poster.PostJSON(url, 5*time.Second, event)
if err != nil {
logger.Errorf("event_callback(rule_id=%d url=%s) fail, resp: %s, err: %v, code: %d", event.RuleId, url, string(resp), err, code)
} else {

View File

@@ -45,11 +45,7 @@ func consume(events []interface{}, sema *semaphore.Semaphore) {
func consumeOne(event *models.AlertCurEvent) {
LogEvent(event, "consume")
if err := event.ParseRule("rule_name"); err != nil {
event.RuleName = fmt.Sprintf("failed to parse rule name: %v", err)
}
if err := event.ParseRule("rule_note"); err != nil {
if err := event.ParseRuleNote(); err != nil {
event.RuleNote = fmt.Sprintf("failed to parse rule note: %v", err)
}
@@ -76,10 +72,9 @@ func persist(event *models.AlertCurEvent) {
// 不管是告警还是恢复,全量告警里都要记录
if err := his.Add(); err != nil {
logger.Errorf(
"event_persist_his_fail: %v rule_id=%d cluster:%s hash=%s tags=%v timestamp=%d value=%s",
"event_persist_his_fail: %v rule_id=%d hash=%s tags=%v timestamp=%d value=%s",
err,
event.RuleId,
event.Cluster,
event.Hash,
event.TagsJSON,
event.TriggerTime,
@@ -102,10 +97,9 @@ func persist(event *models.AlertCurEvent) {
if event.Id > 0 {
if err := event.Add(); err != nil {
logger.Errorf(
"event_persist_cur_fail: %v rule_id=%d cluster:%s hash=%s tags=%v timestamp=%d value=%s",
"event_persist_cur_fail: %v rule_id=%d hash=%s tags=%v timestamp=%d value=%s",
err,
event.RuleId,
event.Cluster,
event.Hash,
event.TagsJSON,
event.TriggerTime,
@@ -128,10 +122,9 @@ func persist(event *models.AlertCurEvent) {
if event.Id > 0 {
if err := event.Add(); err != nil {
logger.Errorf(
"event_persist_cur_fail: %v rule_id=%d cluster:%s hash=%s tags=%v timestamp=%d value=%s",
"event_persist_cur_fail: %v rule_id=%d hash=%s tags=%v timestamp=%d value=%s",
err,
event.RuleId,
event.Cluster,
event.Hash,
event.TagsJSON,
event.TriggerTime,

View File

@@ -0,0 +1,33 @@
package engine
import (
"strconv"
"strings"
"time"
"github.com/didi/nightingale/v5/src/models"
)
func isNoneffective(timestamp int64, alertRule *models.AlertRule) bool {
if alertRule.Disabled == 1 {
return true
}
tm := time.Unix(timestamp, 0)
triggerTime := tm.Format("15:04")
triggerWeek := strconv.Itoa(int(tm.Weekday()))
if alertRule.EnableStime <= alertRule.EnableEtime {
if triggerTime < alertRule.EnableStime || triggerTime > alertRule.EnableEtime {
return true
}
} else {
if triggerTime < alertRule.EnableStime && triggerTime > alertRule.EnableEtime {
return true
}
}
alertRule.EnableDaysOfWeek = strings.Replace(alertRule.EnableDaysOfWeek, "7", "0", 1)
return !strings.Contains(alertRule.EnableDaysOfWeek, triggerWeek)
}

View File

@@ -2,20 +2,15 @@ package engine
import (
"context"
"fmt"
"time"
"github.com/didi/nightingale/v5/src/server/common/sender"
"github.com/didi/nightingale/v5/src/server/config"
promstat "github.com/didi/nightingale/v5/src/server/stat"
"github.com/toolkits/pkg/container/list"
"github.com/toolkits/pkg/logger"
)
var EventQueue = list.NewSafeListLimited(10000000)
func Start(ctx context.Context) error {
err := reloadTpls()
err := initTpls()
if err != nil {
return err
}
@@ -24,40 +19,18 @@ func Start(ctx context.Context) error {
go loopConsume(ctx)
// filter my rules and start worker
//go loopFilterRules(ctx)
go ruleHolder.LoopSyncRules(ctx)
go loopFilterRules(ctx)
go reportQueueSize()
go sender.StartEmailSender()
go initReporter(func(em map[ErrorType]uint64) {
if len(em) == 0 {
return
}
title := fmt.Sprintf("server %s has some errors, please check server logs for detail", config.C.Heartbeat.IP)
msg := ""
for k, v := range em {
msg += fmt.Sprintf("error: %s, count: %d\n", k, v)
}
notifyToMaintainer(title, msg)
})
return nil
}
func Reload() {
err := reloadTpls()
if err != nil {
logger.Error("engine reload err:", err)
}
}
func reportQueueSize() {
for {
time.Sleep(time.Second)
promstat.GaugeAlertQueueSize.Set(float64(EventQueue.Len()))
promstat.GaugeAlertQueueSize.WithLabelValues(config.C.ClusterName).Set(float64(EventQueue.Len()))
}
}

View File

@@ -17,12 +17,11 @@ func LogEvent(event *models.AlertCurEvent, location string, err ...error) {
}
logger.Infof(
"event(%s %s) %s: rule_id=%d cluster:%s %v%s@%d %s",
"event(%s %s) %s: rule_id=%d %v%s@%d %s",
event.Hash,
status,
location,
event.RuleId,
event.Cluster,
event.TagsJSON,
event.TriggerValue,
event.TriggerTime,

69
src/server/engine/mute.go Normal file
View File

@@ -0,0 +1,69 @@
package engine
import (
"github.com/didi/nightingale/v5/src/models"
"github.com/didi/nightingale/v5/src/server/memsto"
)
// 如果传入了clock这个可选参数就表示使用这个clock表示的时间否则就从event的字段中取TriggerTime
func isMuted(event *models.AlertCurEvent, clock ...int64) bool {
mutes, has := memsto.AlertMuteCache.Gets(event.GroupId)
if !has || len(mutes) == 0 {
return false
}
for i := 0; i < len(mutes); i++ {
if matchMute(event, mutes[i], clock...) {
return true
}
}
return false
}
func matchMute(event *models.AlertCurEvent, mute *models.AlertMute, clock ...int64) bool {
ts := event.TriggerTime
if len(clock) > 0 {
ts = clock[0]
}
if ts < mute.Btime || ts > mute.Etime {
return false
}
return matchTags(event.TagsMap, mute.ITags)
}
func matchTag(value string, filter models.TagFilter) bool {
switch filter.Func {
case "==":
return filter.Value == value
case "!=":
return filter.Value != value
case "in":
_, has := filter.Vset[value]
return has
case "not in":
_, has := filter.Vset[value]
return !has
case "=~":
return filter.Regexp.MatchString(value)
case "!~":
return !filter.Regexp.MatchString(value)
}
// unexpect func
return false
}
func matchTags(eventTagsMap map[string]string, itags []models.TagFilter) bool {
for _, filter := range itags {
value, has := eventTagsMap[filter.Key]
if !has {
return false
}
if !matchTag(value, filter) {
return false
}
}
return true
}

View File

@@ -1,202 +0,0 @@
package engine
import (
"strconv"
"strings"
"time"
"github.com/toolkits/pkg/logger"
"github.com/didi/nightingale/v5/src/models"
"github.com/didi/nightingale/v5/src/server/memsto"
)
var AlertMuteStrategies = AlertMuteStrategiesType{
&TimeNonEffectiveMuteStrategy{},
&IdentNotExistsMuteStrategy{},
&BgNotMatchMuteStrategy{},
&EventMuteStrategy{},
}
type AlertMuteStrategiesType []AlertMuteStrategy
func (ss AlertMuteStrategiesType) IsMuted(rule *models.AlertRule, event *models.AlertCurEvent) bool {
for _, s := range ss {
if s.IsMuted(rule, event) {
logger.Debugf("[%T] mute: rule:%+v event:%+v", s, rule, event)
return true
}
}
return false
}
// AlertMuteStrategy 是过滤event的抽象,当返回true时,表示该告警时间由于某些原因不需要告警
type AlertMuteStrategy interface {
IsMuted(rule *models.AlertRule, event *models.AlertCurEvent) bool
}
// TimeNonEffectiveMuteStrategy 根据规则配置的告警时间过滤,如果产生的告警不在规则配置的告警时间内,则不告警
type TimeNonEffectiveMuteStrategy struct{}
func (s *TimeNonEffectiveMuteStrategy) IsMuted(rule *models.AlertRule, event *models.AlertCurEvent) bool {
if rule.Disabled == 1 {
logger.Debugf("[%T] mute: rule_disabled:%d cluster:%s", s, rule.Id, event.Cluster)
return true
}
tm := time.Unix(event.TriggerTime, 0)
triggerTime := tm.Format("15:04")
triggerWeek := strconv.Itoa(int(tm.Weekday()))
enableStime := strings.Fields(rule.EnableStime)
enableEtime := strings.Fields(rule.EnableEtime)
enableDaysOfWeek := strings.Split(rule.EnableDaysOfWeek, ";")
length := len(enableDaysOfWeek)
// enableStime,enableEtime,enableDaysOfWeek三者长度肯定相同这里循环一个即可
for i := 0; i < length; i++ {
enableDaysOfWeek[i] = strings.Replace(enableDaysOfWeek[i], "7", "0", 1)
if !strings.Contains(enableDaysOfWeek[i], triggerWeek) {
continue
}
if enableStime[i] <= enableEtime[i] {
if triggerTime < enableStime[i] || triggerTime > enableEtime[i] {
continue
}
} else {
if triggerTime < enableStime[i] && triggerTime > enableEtime[i] {
continue
}
}
// 到这里说明当前时刻在告警规则的某组生效时间范围内,直接返回 false
return false
}
return true
}
// IdentNotExistsMuteStrategy 根据ident是否存在过滤,如果ident不存在,则target_up的告警直接过滤掉
type IdentNotExistsMuteStrategy struct{}
func (s *IdentNotExistsMuteStrategy) IsMuted(rule *models.AlertRule, event *models.AlertCurEvent) bool {
ident, has := event.TagsMap["ident"]
if !has {
return false
}
_, exists := memsto.TargetCache.Get(ident)
// 如果是target_up的告警,且ident已经不存在了,直接过滤掉
// 这里的判断有点太粗暴了,但是目前没有更好的办法
if !exists && strings.Contains(rule.PromQl, "target_up") {
logger.Debugf("[%T] mute: rule_eval:%d cluster:%s ident:%s", s, rule.Id, event.Cluster, ident)
return true
}
return false
}
// BgNotMatchMuteStrategy 当规则开启只在bg内部告警时,对于非bg内部的机器过滤
type BgNotMatchMuteStrategy struct{}
func (s *BgNotMatchMuteStrategy) IsMuted(rule *models.AlertRule, event *models.AlertCurEvent) bool {
// 没有开启BG内部告警,直接不过滤
if rule.EnableInBG == 0 {
return false
}
ident, has := event.TagsMap["ident"]
if !has {
return false
}
target, exists := memsto.TargetCache.Get(ident)
// 对于包含ident的告警事件check一下ident所属bg和rule所属bg是否相同
// 如果告警规则选择了只在本BG生效那其他BG的机器就不能因此规则产生告警
if exists && target.GroupId != rule.GroupId {
logger.Debugf("[%T] mute: rule_eval:%d cluster:%s", s, rule.Id, event.Cluster)
return true
}
return false
}
type EventMuteStrategy struct{}
var EventMuteStra = new(EventMuteStrategy)
func (s *EventMuteStrategy) IsMuted(rule *models.AlertRule, event *models.AlertCurEvent) bool {
mutes, has := memsto.AlertMuteCache.Gets(event.GroupId)
if !has || len(mutes) == 0 {
return false
}
for i := 0; i < len(mutes); i++ {
if matchMute(event, mutes[i]) {
return true
}
}
return false
}
// matchMute 如果传入了clock这个可选参数就表示使用这个clock表示的时间否则就从event的字段中取TriggerTime
func matchMute(event *models.AlertCurEvent, mute *models.AlertMute, clock ...int64) bool {
if mute.Disabled == 1 {
return false
}
ts := event.TriggerTime
if len(clock) > 0 {
ts = clock[0]
}
// 如果不是全局的,判断 cluster
if mute.Cluster != models.ClusterAll {
// mute.Cluster 是一个字符串可能是多个cluster的组合比如"cluster1 cluster2"
clusters := strings.Fields(mute.Cluster)
cm := make(map[string]struct{}, len(clusters))
for i := 0; i < len(clusters); i++ {
cm[clusters[i]] = struct{}{}
}
// 判断event.Cluster是否包含在cm中
if _, has := cm[event.Cluster]; !has {
return false
}
}
if ts < mute.Btime || ts > mute.Etime {
return false
}
return matchTags(event.TagsMap, mute.ITags)
}
func matchTag(value string, filter models.TagFilter) bool {
switch filter.Func {
case "==":
return filter.Value == value
case "!=":
return filter.Value != value
case "in":
_, has := filter.Vset[value]
return has
case "not in":
_, has := filter.Vset[value]
return !has
case "=~":
return filter.Regexp.MatchString(value)
case "!~":
return !filter.Regexp.MatchString(value)
}
// unexpect func
return false
}
func matchTags(eventTagsMap map[string]string, itags []models.TagFilter) bool {
for _, filter := range itags {
value, has := eventTagsMap[filter.Key]
if !has {
return false
}
if !matchTag(value, filter) {
return false
}
}
return true
}

View File

@@ -10,7 +10,6 @@ import (
"os/exec"
"path"
"strings"
"sync"
"time"
"github.com/pkg/errors"
@@ -30,12 +29,9 @@ import (
"github.com/didi/nightingale/v5/src/storage"
)
var (
tpls map[string]*template.Template
rwLock sync.RWMutex
)
var tpls = make(map[string]*template.Template)
func reloadTpls() error {
func initTpls() error {
if config.C.Alerting.TemplatesDir == "" {
config.C.Alerting.TemplatesDir = path.Join(runner.Cwd, "etc", "template")
}
@@ -60,7 +56,6 @@ func reloadTpls() error {
return errors.New("no tpl files under " + config.C.Alerting.TemplatesDir)
}
tmpTpls := make(map[string]*template.Template)
for i := 0; i < len(tplFiles); i++ {
tplpath := path.Join(config.C.Alerting.TemplatesDir, tplFiles[i])
@@ -69,12 +64,9 @@ func reloadTpls() error {
return errors.WithMessage(err, "failed to parse tpl: "+tplpath)
}
tmpTpls[tplFiles[i]] = tpl
tpls[tplFiles[i]] = tpl
}
rwLock.Lock()
tpls = tmpTpls
rwLock.Unlock()
return nil
}
@@ -86,9 +78,6 @@ type Notice struct {
func genNotice(event *models.AlertCurEvent) Notice {
// build notice body with templates
ntpls := make(map[string]string)
rwLock.RLock()
defer rwLock.RUnlock()
for filename, tpl := range tpls {
var body bytes.Buffer
if err := tpl.Execute(&body, event); err != nil {
@@ -101,13 +90,12 @@ func genNotice(event *models.AlertCurEvent) Notice {
return Notice{Event: event, Tpls: ntpls}
}
func alertingRedisPub(clusterName string, bs []byte) {
channelKey := config.C.Alerting.RedisPub.ChannelPrefix + clusterName
func alertingRedisPub(bs []byte) {
// pub all alerts to redis
if config.C.Alerting.RedisPub.Enable {
err := storage.Redis.Publish(context.Background(), channelKey, bs).Err()
err := storage.Redis.Publish(context.Background(), config.C.Alerting.RedisPub.ChannelKey, bs).Err()
if err != nil {
logger.Errorf("event_notify: redis publish %s err: %v", channelKey, err)
logger.Errorf("event_notify: redis publish %s err: %v", config.C.Alerting.RedisPub.ChannelKey, err)
}
}
}
@@ -125,8 +113,6 @@ func handleNotice(notice Notice, bs []byte) {
wecomset := make(map[string]struct{})
dingtalkset := make(map[string]struct{})
feishuset := make(map[string]struct{})
mmset := make(map[string]struct{})
telegramset := make(map[string]struct{})
for _, user := range notice.Event.NotifyUsersObj {
if user.Email != "" {
@@ -157,16 +143,6 @@ func handleNotice(notice Notice, bs []byte) {
if ret.Exists() {
feishuset[ret.String()] = struct{}{}
}
ret = gjson.GetBytes(bs, "mm_webhook_url")
if ret.Exists() {
mmset[ret.String()] = struct{}{}
}
ret = gjson.GetBytes(bs, "telegram_robot_token")
if ret.Exists() {
telegramset[ret.String()] = struct{}{}
}
}
phones := StringSetKeys(phoneset)
@@ -248,40 +224,6 @@ func handleNotice(notice Notice, bs []byte) {
AtMobiles: phones,
Tokens: StringSetKeys(feishuset),
})
case "mm":
if len(mmset) == 0 {
continue
}
if !slice.ContainsString(config.C.Alerting.NotifyBuiltinChannels, "mm") {
continue
}
content, has := notice.Tpls["mm.tpl"]
if !has {
content = "mm.tpl not found"
}
sender.SendMM(sender.MatterMostMessage{
Text: content,
Tokens: StringSetKeys(mmset),
})
case "telegram":
if len(telegramset) == 0 {
continue
}
if !slice.ContainsString(config.C.Alerting.NotifyBuiltinChannels, "telegram") {
continue
}
content, has := notice.Tpls["telegram.tpl"]
if !has {
content = "telegram.tpl not found"
}
sender.SendTelegram(sender.TelegramMessage{
Text: content,
Tokens: StringSetKeys(telegramset),
})
}
}
}
@@ -296,7 +238,7 @@ func notify(event *models.AlertCurEvent) {
return
}
alertingRedisPub(event.Cluster, stdinBytes)
alertingRedisPub(stdinBytes)
alertingWebhook(event)
handleNotice(notice, stdinBytes)
@@ -374,24 +316,6 @@ func handleSubscribes(event models.AlertCurEvent, subs []*models.AlertSubscribe)
}
func handleSubscribe(event models.AlertCurEvent, sub *models.AlertSubscribe) {
if sub.IsDisabled() {
return
}
// 如果不是全局的,判断 cluster
if sub.Cluster != models.ClusterAll {
// sub.Cluster 是一个字符串可能是多个cluster的组合比如"cluster1 cluster2"
clusters := strings.Fields(sub.Cluster)
cm := make(map[string]struct{}, len(clusters))
for i := 0; i < len(clusters); i++ {
cm[clusters[i]] = struct{}{}
}
if _, has := cm[event.Cluster]; !has {
return
}
}
if !matchTags(event.TagsMap, sub.ITags) {
return
}
@@ -435,10 +359,6 @@ func alertingCallScript(stdinBytes []byte) {
return
}
if config.C.Alerting.Timeout == 0 {
config.C.Alerting.Timeout = 30000
}
fpath := config.C.Alerting.CallScript.ScriptPath
cmd := exec.Command(fpath)
cmd.Stdin = bytes.NewReader(stdinBytes)
@@ -454,7 +374,7 @@ func alertingCallScript(stdinBytes []byte) {
return
}
err, isTimeout := sys.WrapTimeout(cmd, time.Duration(config.C.Alerting.Timeout)*time.Millisecond)
err, isTimeout := sys.WrapTimeout(cmd, time.Duration(30)*time.Second)
if isTimeout {
if err == nil {

View File

@@ -2,6 +2,7 @@ package engine
import (
"encoding/json"
"runtime"
"time"
"github.com/didi/nightingale/v5/src/models"
@@ -19,30 +20,20 @@ type MaintainMessage struct {
Content string `json:"content"`
}
// notify to maintainer to handle the error
func notifyToMaintainer(title, msg string) {
logger.Errorf("notifyToMaintainer, msg: %s", msg)
users := memsto.UserCache.GetMaintainerUsers()
if len(users) == 0 {
func notifyMaintainerWithPlugin(e error, title, triggerTime string, users []*models.User) {
if !config.C.Alerting.CallPlugin.Enable {
return
}
triggerTime := time.Now().Format("2006/01/02 - 15:04:05")
notifyMaintainerWithPlugin(title, msg, triggerTime, users)
notifyMaintainerWithBuiltin(title, msg, triggerTime, users)
}
func notifyMaintainerWithPlugin(title, msg, triggerTime string, users []*models.User) {
if !config.C.Alerting.CallPlugin.Enable {
if runtime.GOOS == "windows" {
logger.Errorf("call notify plugin on unsupported os: %s", runtime.GOOS)
return
}
stdinBytes, err := json.Marshal(MaintainMessage{
Tos: users,
Title: title,
Content: "Title: " + title + "\nContent: " + msg + "\nTime: " + triggerTime,
Content: "Title: " + title + "\nContent: " + e.Error() + "\nTime: " + triggerTime,
})
if err != nil {
@@ -54,7 +45,22 @@ func notifyMaintainerWithPlugin(title, msg, triggerTime string, users []*models.
logger.Debugf("notify maintainer with plugin done")
}
func notifyMaintainerWithBuiltin(title, msg, triggerTime string, users []*models.User) {
// notify to maintainer to handle the error
func notifyToMaintainer(e error, title string) {
logger.Errorf("notifyToMaintainer, title:%s, error:%v", title, e)
users := memsto.UserCache.GetMaintainerUsers()
if len(users) == 0 {
return
}
triggerTime := time.Now().Format("2006/01/02 - 15:04:05")
notifyMaintainerWithPlugin(e, title, triggerTime, users)
notifyMaintainerWithBuiltin(e, title, triggerTime, users)
}
func notifyMaintainerWithBuiltin(e error, title, triggerTime string, users []*models.User) {
if len(config.C.Alerting.NotifyBuiltinChannels) == 0 {
return
}
@@ -64,8 +70,6 @@ func notifyMaintainerWithBuiltin(title, msg, triggerTime string, users []*models
wecomset := make(map[string]struct{})
dingtalkset := make(map[string]struct{})
feishuset := make(map[string]struct{})
mmset := make(map[string]struct{})
telegramset := make(map[string]struct{})
for _, user := range users {
if user.Email != "" {
@@ -96,16 +100,6 @@ func notifyMaintainerWithBuiltin(title, msg, triggerTime string, users []*models
if ret.Exists() {
feishuset[ret.String()] = struct{}{}
}
ret = gjson.GetBytes(bs, "mm_webhook_url")
if ret.Exists() {
mmset[ret.String()] = struct{}{}
}
ret = gjson.GetBytes(bs, "telegram_robot_token")
if ret.Exists() {
telegramset[ret.String()] = struct{}{}
}
}
phones := StringSetKeys(phoneset)
@@ -116,13 +110,13 @@ func notifyMaintainerWithBuiltin(title, msg, triggerTime string, users []*models
if len(emailset) == 0 {
continue
}
content := "Title: " + title + "\nContent: " + msg + "\nTime: " + triggerTime
content := "Title: " + title + "\nContent: " + e.Error() + "\nTime: " + triggerTime
sender.WriteEmail(title, content, StringSetKeys(emailset))
case "dingtalk":
if len(dingtalkset) == 0 {
continue
}
content := "**Title: **" + title + "\n**Content: **" + msg + "\n**Time: **" + triggerTime
content := "**Title: **" + title + "\n**Content: **" + e.Error() + "\n**Time: **" + triggerTime
sender.SendDingtalk(sender.DingtalkMessage{
Title: title,
Text: content,
@@ -133,7 +127,7 @@ func notifyMaintainerWithBuiltin(title, msg, triggerTime string, users []*models
if len(wecomset) == 0 {
continue
}
content := "**Title: **" + title + "\n**Content: **" + msg + "\n**Time: **" + triggerTime
content := "**Title: **" + title + "\n**Content: **" + e.Error() + "\n**Time: **" + triggerTime
sender.SendWecom(sender.WecomMessage{
Text: content,
Tokens: StringSetKeys(wecomset),
@@ -143,30 +137,12 @@ func notifyMaintainerWithBuiltin(title, msg, triggerTime string, users []*models
continue
}
content := "Title: " + title + "\nContent: " + msg + "\nTime: " + triggerTime
content := "Title: " + title + "\nContent: " + e.Error() + "\nTime: " + triggerTime
sender.SendFeishu(sender.FeishuMessage{
Text: content,
AtMobiles: phones,
Tokens: StringSetKeys(feishuset),
})
case "mm":
if len(mmset) == 0 {
continue
}
content := "**Title: **" + title + "\n**Content: **" + msg + "\n**Time: **" + triggerTime
sender.SendMM(sender.MatterMostMessage{
Text: content,
Tokens: StringSetKeys(mmset),
})
case "telegram":
if len(telegramset) == 0 {
continue
}
content := "**Title: **" + title + "\n**Content: **" + msg + "\n**Time: **" + triggerTime
sender.SendTelegram(sender.TelegramMessage{
Text: content,
Tokens: StringSetKeys(telegramset),
})
}
}
}

View File

@@ -0,0 +1,5 @@
package engine
import "github.com/toolkits/pkg/container/list"
var EventQueue = list.NewSafeListLimited(10000000)

View File

@@ -1,65 +0,0 @@
package engine
import (
"sync"
"time"
)
type ErrorType string
// register new error here
const (
QueryPrometheusError ErrorType = "QueryPrometheusError"
RuntimeError ErrorType = "RuntimeError"
)
type reporter struct {
sync.Mutex
em map[ErrorType]uint64
cb func(em map[ErrorType]uint64)
}
var rp reporter
func initReporter(cb func(em map[ErrorType]uint64)) {
rp = reporter{cb: cb, em: make(map[ErrorType]uint64)}
rp.Start()
}
func Report(errorType ErrorType) {
rp.report(errorType)
}
func (r *reporter) reset() map[ErrorType]uint64 {
r.Lock()
defer r.Unlock()
if len(r.em) == 0 {
return nil
}
oem := r.em
r.em = make(map[ErrorType]uint64)
return oem
}
func (r *reporter) report(errorType ErrorType) {
r.Lock()
defer r.Unlock()
if count, has := r.em[errorType]; has {
r.em[errorType] = count + 1
} else {
r.em[errorType] = 1
}
}
func (r *reporter) Start() {
for {
select {
case <-time.After(time.Minute):
cur := r.reset()
if cur != nil {
r.cb(cur)
}
}
}
}

View File

@@ -1,166 +0,0 @@
package engine
import (
"context"
"fmt"
"strings"
"sync"
"time"
"github.com/didi/nightingale/v5/src/server/config"
"github.com/didi/nightingale/v5/src/server/memsto"
"github.com/didi/nightingale/v5/src/server/naming"
)
type RuleContext interface {
Key() string
Hash() string
Prepare()
Start()
Eval()
Stop()
}
var ruleHolder = &RuleHolder{
alertRules: make(map[string]RuleContext),
recordRules: make(map[string]RuleContext),
externalAlertRules: make(map[string]*AlertRuleContext),
}
type RuleHolder struct {
externalLock sync.RWMutex
// key: hash
alertRules map[string]RuleContext
// key: hash
recordRules map[string]RuleContext
// key: key
externalAlertRules map[string]*AlertRuleContext
}
func (rh *RuleHolder) LoopSyncRules(ctx context.Context) {
time.Sleep(time.Duration(config.C.EngineDelay) * time.Second)
duration := 9000 * time.Millisecond
for {
select {
case <-ctx.Done():
return
case <-time.After(duration):
rh.SyncAlertRules()
rh.SyncRecordRules()
}
}
}
func (rh *RuleHolder) SyncAlertRules() {
ids := memsto.AlertRuleCache.GetRuleIds()
alertRules := make(map[string]RuleContext)
externalAllRules := make(map[string]*AlertRuleContext)
for _, id := range ids {
rule := memsto.AlertRuleCache.Get(id)
if rule == nil {
continue
}
// 如果 rule 不是通过 prometheus engine 来告警的,则创建为 externalRule
if !rule.IsPrometheusRule() {
ruleClusters := strings.Fields(rule.Cluster)
for _, cluster := range ruleClusters {
// hash ring not hit
if !naming.ClusterHashRing.IsHit(cluster, fmt.Sprintf("%d", rule.Id), config.C.Heartbeat.Endpoint) {
continue
}
externalRule := NewAlertRuleContext(rule, cluster)
externalAllRules[externalRule.Key()] = externalRule
}
continue
}
ruleClusters := config.ReaderClients.Hit(rule.Cluster)
for _, cluster := range ruleClusters {
// hash ring not hit
if !naming.ClusterHashRing.IsHit(cluster, fmt.Sprintf("%d", rule.Id), config.C.Heartbeat.Endpoint) {
continue
}
alertRule := NewAlertRuleContext(rule, cluster)
alertRules[alertRule.Hash()] = alertRule
}
}
for hash, rule := range alertRules {
if _, has := rh.alertRules[hash]; !has {
rule.Prepare()
rule.Start()
rh.alertRules[hash] = rule
}
}
for hash, rule := range rh.alertRules {
if _, has := alertRules[hash]; !has {
rule.Stop()
delete(rh.alertRules, hash)
}
}
for hash, rule := range externalAllRules {
rh.externalLock.Lock()
if _, has := rh.externalAlertRules[hash]; !has {
rule.Prepare()
rh.externalAlertRules[hash] = rule
}
rh.externalLock.Unlock()
}
rh.externalLock.Lock()
for hash := range rh.externalAlertRules {
if _, has := externalAllRules[hash]; !has {
delete(rh.externalAlertRules, hash)
}
}
rh.externalLock.Unlock()
}
func (rh *RuleHolder) SyncRecordRules() {
ids := memsto.RecordingRuleCache.GetRuleIds()
recordRules := make(map[string]RuleContext)
for _, id := range ids {
rule := memsto.RecordingRuleCache.Get(id)
if rule == nil {
continue
}
ruleClusters := config.ReaderClients.Hit(rule.Cluster)
for _, cluster := range ruleClusters {
if !naming.ClusterHashRing.IsHit(cluster, fmt.Sprintf("%d", rule.Id), config.C.Heartbeat.Endpoint) {
continue
}
recordRule := NewRecordRuleContext(rule, cluster)
recordRules[recordRule.Hash()] = recordRule
}
}
for hash, rule := range recordRules {
if _, has := rh.recordRules[hash]; !has {
rule.Prepare()
rule.Start()
rh.recordRules[hash] = rule
}
}
for hash, rule := range rh.recordRules {
if _, has := recordRules[hash]; !has {
rule.Stop()
delete(rh.recordRules, hash)
}
}
}
func GetExternalAlertRule(cluster string, id int64) (*AlertRuleContext, bool) {
key := fmt.Sprintf("alert-%s-%d", cluster, id)
ruleHolder.externalLock.RLock()
defer ruleHolder.externalLock.RUnlock()
rule, has := ruleHolder.externalAlertRules[key]
return rule, has
}

View File

@@ -1,300 +0,0 @@
package engine
import (
"context"
"fmt"
"strings"
"time"
"github.com/prometheus/common/model"
"github.com/toolkits/pkg/logger"
"github.com/toolkits/pkg/str"
"github.com/didi/nightingale/v5/src/models"
"github.com/didi/nightingale/v5/src/pkg/prom"
"github.com/didi/nightingale/v5/src/server/common/conv"
"github.com/didi/nightingale/v5/src/server/config"
"github.com/didi/nightingale/v5/src/server/memsto"
promstat "github.com/didi/nightingale/v5/src/server/stat"
)
type AlertRuleContext struct {
cluster string
quit chan struct{}
rule *models.AlertRule
fires *AlertCurEventMap
pendings *AlertCurEventMap
}
func NewAlertRuleContext(rule *models.AlertRule, cluster string) *AlertRuleContext {
return &AlertRuleContext{
cluster: cluster,
quit: make(chan struct{}),
rule: rule,
}
}
func (arc *AlertRuleContext) RuleFromCache() *models.AlertRule {
return memsto.AlertRuleCache.Get(arc.rule.Id)
}
func (arc *AlertRuleContext) Key() string {
return fmt.Sprintf("alert-%s-%d", arc.cluster, arc.rule.Id)
}
func (arc *AlertRuleContext) Hash() string {
return str.MD5(fmt.Sprintf("%d_%d_%s_%s",
arc.rule.Id,
arc.rule.PromEvalInterval,
arc.rule.PromQl,
arc.cluster,
))
}
func (arc *AlertRuleContext) Prepare() {
arc.recoverAlertCurEventFromDb()
}
func (arc *AlertRuleContext) Start() {
logger.Infof("eval:%s started", arc.Key())
interval := arc.rule.PromEvalInterval
if interval <= 0 {
interval = 10
}
go func() {
for {
select {
case <-arc.quit:
return
default:
arc.Eval()
time.Sleep(time.Duration(interval) * time.Second)
}
}
}()
}
func (arc *AlertRuleContext) Eval() {
promql := strings.TrimSpace(arc.rule.PromQl)
if promql == "" {
logger.Errorf("rule_eval:%s promql is blank", arc.Key())
return
}
if config.ReaderClients.IsNil(arc.cluster) {
logger.Errorf("rule_eval:%s error reader client is nil", arc.Key())
return
}
readerClient := config.ReaderClients.GetCli(arc.cluster)
var value model.Value
var err error
cachedRule := arc.RuleFromCache()
if cachedRule == nil {
logger.Errorf("rule_eval:%s rule not found", arc.Key())
return
}
// 如果是单个goroutine执行, 完全可以考虑把cachedRule赋值给arc.rule, 不会有问题
// 但是在externalRule的场景中, 会调用HandleVectors/RecoverSingle;就行不通了,还是在需要的时候从cache中拿rule吧
// arc.rule = cachedRule
// 如果cache中的规则由prometheus规则改为其他类型也没必要再去prometheus查询了
if cachedRule.IsPrometheusRule() {
var warnings prom.Warnings
value, warnings, err = readerClient.Query(context.Background(), promql, time.Now())
if err != nil {
logger.Errorf("rule_eval:%s promql:%s, error:%v", arc.Key(), promql, err)
//notifyToMaintainer(err, "failed to query prometheus")
Report(QueryPrometheusError)
return
}
if len(warnings) > 0 {
logger.Errorf("rule_eval:%s promql:%s, warnings:%v", arc.Key(), promql, warnings)
return
}
logger.Debugf("rule_eval:%s promql:%s, value:%v", arc.Key(), promql, value)
}
arc.HandleVectors(conv.ConvertVectors(value), "inner")
}
func (arc *AlertRuleContext) HandleVectors(vectors []conv.Vector, from string) {
// 有可能rule的一些配置已经发生变化比如告警接收人、callbacks等
// 这些信息的修改是不会引起worker restart的但是确实会影响告警处理逻辑
// 所以这里直接从memsto.AlertRuleCache中获取并覆盖
cachedRule := arc.RuleFromCache()
if cachedRule == nil {
logger.Errorf("rule_eval:%s rule not found", arc.Key())
return
}
now := time.Now().Unix()
alertingKeys := map[string]struct{}{}
for _, vector := range vectors {
alertVector := NewAlertVector(arc, cachedRule, vector, from)
event := alertVector.BuildEvent(now)
// 如果event被mute了,本质也是fire的状态,这里无论如何都添加到alertingKeys中,防止fire的事件自动恢复了
alertingKeys[alertVector.Hash()] = struct{}{}
if AlertMuteStrategies.IsMuted(cachedRule, event) {
logger.Debugf("rule_eval:%s event:%+v is muted", arc.Key(), event)
continue
}
arc.handleEvent(event)
}
arc.HandleRecover(alertingKeys, now)
}
func (arc *AlertRuleContext) HandleRecover(alertingKeys map[string]struct{}, now int64) {
for _, hash := range arc.pendings.Keys() {
if _, has := alertingKeys[hash]; has {
continue
}
arc.pendings.Delete(hash)
}
for hash := range arc.fires.GetAll() {
if _, has := alertingKeys[hash]; has {
continue
}
arc.RecoverSingle(hash, now, nil)
}
}
func (arc *AlertRuleContext) RecoverSingle(hash string, now int64, value *string) {
cachedRule := arc.RuleFromCache()
if cachedRule == nil {
logger.Errorf("rule_eval:%s rule not found", arc.Key())
return
}
event, has := arc.fires.Get(hash)
if !has {
return
}
// 如果配置了留观时长,就不能立马恢复了
if cachedRule.RecoverDuration > 0 && now-event.LastEvalTime < cachedRule.RecoverDuration {
return
}
if value != nil {
event.TriggerValue = *value
}
// 没查到触发阈值的vector姑且就认为这个vector的值恢复了
// 我确实无法分辨是prom中有值但是未满足阈值所以没返回还是prom中确实丢了一些点导致没有数据可以返回尴尬
arc.fires.Delete(hash)
arc.pendings.Delete(hash)
// 可能是因为调整了promql才恢复的所以事件里边要体现最新的promql否则用户会比较困惑
// 当然其实rule的各个字段都可能发生变化了都更新一下吧
cachedRule.UpdateEvent(event)
event.IsRecovered = true
event.LastEvalTime = now
arc.pushEventToQueue(event)
}
func (arc *AlertRuleContext) handleEvent(event *models.AlertCurEvent) {
if event == nil {
logger.Debugf("rule_eval:%s event:%+v is nil", arc.Key(), event)
return
}
if event.PromForDuration == 0 {
arc.fireEvent(event)
return
}
var preTriggerTime int64
preEvent, has := arc.pendings.Get(event.Hash)
if has {
arc.pendings.UpdateLastEvalTime(event.Hash, event.LastEvalTime)
preTriggerTime = preEvent.TriggerTime
} else {
arc.pendings.Set(event.Hash, event)
preTriggerTime = event.TriggerTime
}
if event.LastEvalTime-preTriggerTime+int64(event.PromEvalInterval) >= int64(event.PromForDuration) {
arc.fireEvent(event)
}
}
func (arc *AlertRuleContext) fireEvent(event *models.AlertCurEvent) {
// As arc.rule maybe outdated, use rule from cache
cachedRule := arc.RuleFromCache()
if cachedRule == nil {
logger.Errorf("rule_eval:%s event:%+v is nil", arc.Key(), event)
return
}
if fired, has := arc.fires.Get(event.Hash); has {
arc.fires.UpdateLastEvalTime(event.Hash, event.LastEvalTime)
if cachedRule.NotifyRepeatStep == 0 {
// 说明不想重复通知那就直接返回了nothing to do
logger.Debugf("rule_eval:%s event:%+v nothing to do", arc.Key(), event)
return
}
// 之前发送过告警了,这次是否要继续发送,要看是否过了通道静默时间
if event.LastEvalTime > fired.LastSentTime+int64(cachedRule.NotifyRepeatStep)*60 {
if cachedRule.NotifyMaxNumber == 0 {
// 最大可以发送次数如果是0表示不想限制最大发送次数一直发即可
event.NotifyCurNumber = fired.NotifyCurNumber + 1
event.FirstTriggerTime = fired.FirstTriggerTime
arc.pushEventToQueue(event)
} else {
// 有最大发送次数的限制,就要看已经发了几次了,是否达到了最大发送次数
if fired.NotifyCurNumber >= cachedRule.NotifyMaxNumber {
logger.Debugf("rule_eval:%s event:%+v notify to max number", arc.Key(), event)
return
} else {
event.NotifyCurNumber = fired.NotifyCurNumber + 1
event.FirstTriggerTime = fired.FirstTriggerTime
arc.pushEventToQueue(event)
}
}
}
} else {
event.NotifyCurNumber = 1
event.FirstTriggerTime = event.TriggerTime
arc.pushEventToQueue(event)
}
}
func (arc *AlertRuleContext) pushEventToQueue(event *models.AlertCurEvent) {
if !event.IsRecovered {
event.LastSentTime = event.LastEvalTime
arc.fires.Set(event.Hash, event)
}
promstat.CounterAlertsTotal.WithLabelValues(event.Cluster).Inc()
LogEvent(event, "push_queue")
if !EventQueue.PushFront(event) {
logger.Warningf("event_push_queue: queue is full, event:%+v", event)
}
}
func (arc *AlertRuleContext) Stop() {
logger.Infof("%s stopped", arc.Key())
close(arc.quit)
}
func (arc *AlertRuleContext) recoverAlertCurEventFromDb() {
arc.pendings = NewAlertCurEventMap(nil)
curEvents, err := models.AlertCurEventGetByRuleIdAndCluster(arc.rule.Id, arc.cluster)
if err != nil {
logger.Errorf("recover event from db for rule:%s failed, err:%s", arc.Key(), err)
arc.fires = NewAlertCurEventMap(nil)
return
}
fireMap := make(map[string]*models.AlertCurEvent)
for _, event := range curEvents {
event.DB2Mem()
fireMap[event.Hash] = event
}
arc.fires = NewAlertCurEventMap(fireMap)
}

View File

@@ -1,189 +0,0 @@
package engine
import (
"fmt"
"sort"
"strings"
"sync"
"github.com/toolkits/pkg/str"
"github.com/didi/nightingale/v5/src/models"
"github.com/didi/nightingale/v5/src/server/common/conv"
"github.com/didi/nightingale/v5/src/server/memsto"
)
type AlertCurEventMap struct {
sync.RWMutex
Data map[string]*models.AlertCurEvent
}
func (a *AlertCurEventMap) SetAll(data map[string]*models.AlertCurEvent) {
a.Lock()
defer a.Unlock()
a.Data = data
}
func (a *AlertCurEventMap) Set(key string, value *models.AlertCurEvent) {
a.Lock()
defer a.Unlock()
a.Data[key] = value
}
func (a *AlertCurEventMap) Get(key string) (*models.AlertCurEvent, bool) {
a.RLock()
defer a.RUnlock()
event, exists := a.Data[key]
return event, exists
}
func (a *AlertCurEventMap) UpdateLastEvalTime(key string, lastEvalTime int64) {
a.Lock()
defer a.Unlock()
event, exists := a.Data[key]
if !exists {
return
}
event.LastEvalTime = lastEvalTime
}
func (a *AlertCurEventMap) Delete(key string) {
a.Lock()
defer a.Unlock()
delete(a.Data, key)
}
func (a *AlertCurEventMap) Keys() []string {
a.RLock()
defer a.RUnlock()
keys := make([]string, 0, len(a.Data))
for k := range a.Data {
keys = append(keys, k)
}
return keys
}
func (a *AlertCurEventMap) GetAll() map[string]*models.AlertCurEvent {
a.RLock()
defer a.RUnlock()
return a.Data
}
func NewAlertCurEventMap(data map[string]*models.AlertCurEvent) *AlertCurEventMap {
if data == nil {
return &AlertCurEventMap{
Data: make(map[string]*models.AlertCurEvent),
}
}
return &AlertCurEventMap{
Data: data,
}
}
// AlertVector 包含一个告警事件的告警上下文
type AlertVector struct {
Ctx *AlertRuleContext
Rule *models.AlertRule
Vector conv.Vector
From string
tagsMap map[string]string
tagsArr []string
target string
targetNote string
groupName string
}
func NewAlertVector(ctx *AlertRuleContext, rule *models.AlertRule, vector conv.Vector, from string) *AlertVector {
if rule == nil {
rule = ctx.rule
}
av := &AlertVector{
Ctx: ctx,
Rule: rule,
Vector: vector,
From: from,
}
av.fillTags()
av.mayHandleIdent()
av.mayHandleGroup()
return av
}
func (av *AlertVector) Hash() string {
return str.MD5(fmt.Sprintf("%d_%s_%s", av.Rule.Id, av.Vector.Key, av.Ctx.cluster))
}
func (av *AlertVector) fillTags() {
// handle series tags
tagsMap := make(map[string]string)
for label, value := range av.Vector.Labels {
tagsMap[string(label)] = string(value)
}
// handle rule tags
for _, tag := range av.Rule.AppendTagsJSON {
arr := strings.SplitN(tag, "=", 2)
tagsMap[arr[0]] = arr[1]
}
tagsMap["rulename"] = av.Rule.Name
av.tagsMap = tagsMap
// handle tagsArr
av.tagsArr = labelMapToArr(tagsMap)
}
func (av *AlertVector) mayHandleIdent() {
// handle ident
if ident, has := av.tagsMap["ident"]; has {
if target, exists := memsto.TargetCache.Get(ident); exists {
av.target = target.Ident
av.targetNote = target.Note
}
}
}
func (av *AlertVector) mayHandleGroup() {
// handle bg
bg := memsto.BusiGroupCache.GetByBusiGroupId(av.Rule.GroupId)
if bg != nil {
av.groupName = bg.Name
}
}
func (av *AlertVector) BuildEvent(now int64) *models.AlertCurEvent {
event := av.Rule.GenerateNewEvent()
event.TriggerTime = av.Vector.Timestamp
event.TagsMap = av.tagsMap
event.Cluster = av.Ctx.cluster
event.Hash = av.Hash()
event.TargetIdent = av.target
event.TargetNote = av.targetNote
event.TriggerValue = av.Vector.ReadableValue()
event.TagsJSON = av.tagsArr
event.GroupName = av.groupName
event.Tags = strings.Join(av.tagsArr, ",,")
event.IsRecovered = false
if av.From == "inner" {
event.LastEvalTime = now
} else {
event.LastEvalTime = event.TriggerTime
}
return event
}
func labelMapToArr(m map[string]string) []string {
numLabels := len(m)
labelStrings := make([]string, 0, numLabels)
for label, value := range m {
labelStrings = append(labelStrings, fmt.Sprintf("%s=%s", label, value))
}
if numLabels > 1 {
sort.Strings(labelStrings)
}
return labelStrings
}

View File

@@ -1,100 +0,0 @@
package engine
import (
"context"
"fmt"
"strings"
"time"
"github.com/toolkits/pkg/logger"
"github.com/toolkits/pkg/str"
"github.com/didi/nightingale/v5/src/models"
"github.com/didi/nightingale/v5/src/server/common/conv"
"github.com/didi/nightingale/v5/src/server/config"
"github.com/didi/nightingale/v5/src/server/writer"
)
type RecordRuleContext struct {
cluster string
quit chan struct{}
rule *models.RecordingRule
}
func NewRecordRuleContext(rule *models.RecordingRule, cluster string) *RecordRuleContext {
return &RecordRuleContext{
cluster: cluster,
quit: make(chan struct{}),
rule: rule,
}
}
func (rrc *RecordRuleContext) Key() string {
return fmt.Sprintf("record-%s-%d", rrc.cluster, rrc.rule.Id)
}
func (rrc *RecordRuleContext) Hash() string {
return str.MD5(fmt.Sprintf("%d_%d_%s_%s",
rrc.rule.Id,
rrc.rule.PromEvalInterval,
rrc.rule.PromQl,
rrc.cluster,
))
}
func (rrc *RecordRuleContext) Prepare() {}
func (rrc *RecordRuleContext) Start() {
logger.Infof("eval:%s started", rrc.Key())
interval := rrc.rule.PromEvalInterval
if interval <= 0 {
interval = 10
}
go func() {
for {
select {
case <-rrc.quit:
return
default:
rrc.Eval()
time.Sleep(time.Duration(interval) * time.Second)
}
}
}()
}
func (rrc *RecordRuleContext) Eval() {
promql := strings.TrimSpace(rrc.rule.PromQl)
if promql == "" {
logger.Errorf("eval:%s promql is blank", rrc.Key())
return
}
if config.ReaderClients.IsNil(rrc.cluster) {
logger.Errorf("eval:%s reader client is nil", rrc.Key())
return
}
value, warnings, err := config.ReaderClients.GetCli(rrc.cluster).Query(context.Background(), promql, time.Now())
if err != nil {
logger.Errorf("eval:%d promql:%s, error:%v", rrc.Key(), promql, err)
return
}
if len(warnings) > 0 {
logger.Errorf("eval:%d promql:%s, warnings:%v", rrc.Key(), promql, warnings)
return
}
ts := conv.ConvertToTimeSeries(value, rrc.rule)
if len(ts) != 0 {
for _, v := range ts {
writer.Writers.PushSample(rrc.rule.Name, v, rrc.cluster)
}
}
}
func (rrc *RecordRuleContext) Stop() {
logger.Infof("%s stopped", rrc.Key())
close(rrc.quit)
}

575
src/server/engine/worker.go Normal file
View File

@@ -0,0 +1,575 @@
package engine
import (
"context"
"fmt"
"math/rand"
"sort"
"strings"
"time"
"github.com/didi/nightingale/v5/src/server/writer"
"github.com/prometheus/common/model"
"github.com/toolkits/pkg/logger"
"github.com/toolkits/pkg/net/httplib"
"github.com/toolkits/pkg/str"
"github.com/didi/nightingale/v5/src/models"
"github.com/didi/nightingale/v5/src/pkg/prom"
"github.com/didi/nightingale/v5/src/server/common/conv"
"github.com/didi/nightingale/v5/src/server/config"
"github.com/didi/nightingale/v5/src/server/memsto"
"github.com/didi/nightingale/v5/src/server/naming"
"github.com/didi/nightingale/v5/src/server/reader"
promstat "github.com/didi/nightingale/v5/src/server/stat"
)
func loopFilterRules(ctx context.Context) {
// wait for samples
time.Sleep(time.Duration(config.C.EngineDelay) * time.Second)
duration := time.Duration(9000) * time.Millisecond
for {
select {
case <-ctx.Done():
return
case <-time.After(duration):
filterRules()
filterRecordingRules()
}
}
}
func filterRules() {
ids := memsto.AlertRuleCache.GetRuleIds()
logger.Debugf("AlertRuleCache.GetRuleIds successids.len: %d", len(ids))
count := len(ids)
mines := make([]int64, 0, count)
for i := 0; i < count; i++ {
node, err := naming.HashRing.GetNode(fmt.Sprint(ids[i]))
if err != nil {
logger.Warning("failed to get node from hashring:", err)
continue
}
if node == config.C.Heartbeat.Endpoint {
mines = append(mines, ids[i])
}
}
Workers.Build(mines)
}
type RuleEval struct {
rule *models.AlertRule
fires map[string]*models.AlertCurEvent
pendings map[string]*models.AlertCurEvent
quit chan struct{}
}
func (r RuleEval) Stop() {
logger.Infof("rule_eval:%d stopping", r.RuleID())
close(r.quit)
}
func (r RuleEval) RuleID() int64 {
return r.rule.Id
}
func (r RuleEval) Start() {
logger.Infof("rule_eval:%d started", r.RuleID())
for {
select {
case <-r.quit:
// logger.Infof("rule_eval:%d stopped", r.RuleID())
return
default:
r.Work()
logger.Debugf("rule executedrule_id=%d", r.RuleID())
interval := r.rule.PromEvalInterval
if interval <= 0 {
interval = 10
}
time.Sleep(time.Duration(interval) * time.Second)
}
}
}
type AnomalyPoint struct {
Data model.Matrix `json:"data"`
Err string `json:"error"`
}
func (r RuleEval) Work() {
promql := strings.TrimSpace(r.rule.PromQl)
if promql == "" {
logger.Errorf("rule_eval:%d promql is blank", r.RuleID())
return
}
var value model.Value
var err error
if r.rule.Algorithm == "" {
var warnings prom.Warnings
value, warnings, err = reader.Client.Query(context.Background(), promql, time.Now())
if err != nil {
logger.Errorf("rule_eval:%d promql:%s, error:%v", r.RuleID(), promql, err)
notifyToMaintainer(err, "failed to query prometheus")
return
}
if len(warnings) > 0 {
logger.Errorf("rule_eval:%d promql:%s, warnings:%v", r.RuleID(), promql, warnings)
return
}
} else {
var res AnomalyPoint
count := len(config.C.AnomalyDataApi)
for _, i := range rand.Perm(count) {
url := fmt.Sprintf("%s?rid=%d", config.C.AnomalyDataApi[i], r.rule.Id)
err = httplib.Get(url).SetTimeout(time.Duration(3000) * time.Millisecond).ToJSON(&res)
if err != nil {
logger.Errorf("curl %s fail: %v", url, err)
continue
}
if res.Err != "" {
logger.Errorf("curl %s fail: %s", url, res.Err)
continue
}
value = res.Data
logger.Debugf("curl %s get: %+v", url, res.Data)
}
}
r.judge(conv.ConvertVectors(value))
}
type WorkersType struct {
rules map[string]RuleEval
recordRules map[string]RecordingRuleEval
}
var Workers = &WorkersType{rules: make(map[string]RuleEval), recordRules: make(map[string]RecordingRuleEval)}
func (ws *WorkersType) Build(rids []int64) {
rules := make(map[string]*models.AlertRule)
for i := 0; i < len(rids); i++ {
rule := memsto.AlertRuleCache.Get(rids[i])
if rule == nil {
continue
}
hash := str.MD5(fmt.Sprintf("%d_%d_%s",
rule.Id,
rule.PromEvalInterval,
rule.PromQl,
))
rules[hash] = rule
}
// stop old
for hash := range Workers.rules {
if _, has := rules[hash]; !has {
Workers.rules[hash].Stop()
delete(Workers.rules, hash)
}
}
// start new
for hash := range rules {
if _, has := Workers.rules[hash]; has {
// already exists
continue
}
elst, err := models.AlertCurEventGetByRule(rules[hash].Id)
if err != nil {
logger.Errorf("worker_build: AlertCurEventGetByRule failed: %v", err)
continue
}
firemap := make(map[string]*models.AlertCurEvent)
for i := 0; i < len(elst); i++ {
elst[i].DB2Mem()
firemap[elst[i].Hash] = elst[i]
}
re := RuleEval{
rule: rules[hash],
quit: make(chan struct{}),
fires: firemap,
pendings: make(map[string]*models.AlertCurEvent),
}
go re.Start()
Workers.rules[hash] = re
}
}
func (ws *WorkersType) BuildRe(rids []int64) {
rules := make(map[string]*models.RecordingRule)
for i := 0; i < len(rids); i++ {
rule := memsto.RecordingRuleCache.Get(rids[i])
if rule == nil {
continue
}
if rule.Disabled == 1 {
continue
}
hash := str.MD5(fmt.Sprintf("%d_%d_%s_%s",
rule.Id,
rule.PromEvalInterval,
rule.PromQl,
rule.AppendTags,
))
rules[hash] = rule
}
// stop old
for hash := range Workers.recordRules {
if _, has := rules[hash]; !has {
Workers.recordRules[hash].Stop()
delete(Workers.recordRules, hash)
}
}
// start new
for hash := range rules {
if _, has := Workers.recordRules[hash]; has {
// already exists
continue
}
re := RecordingRuleEval{
rule: rules[hash],
quit: make(chan struct{}),
}
go re.Start()
Workers.recordRules[hash] = re
}
}
func (r RuleEval) judge(vectors []conv.Vector) {
// 有可能rule的一些配置已经发生变化比如告警接收人、callbacks等
// 这些信息的修改是不会引起worker restart的但是确实会影响告警处理逻辑
// 所以这里直接从memsto.AlertRuleCache中获取并覆盖
curRule := memsto.AlertRuleCache.Get(r.rule.Id)
if curRule == nil {
return
}
r.rule = curRule
count := len(vectors)
alertingKeys := make(map[string]struct{})
now := time.Now().Unix()
for i := 0; i < count; i++ {
// compute hash
hash := str.MD5(fmt.Sprintf("%d_%s", r.rule.Id, vectors[i].Key))
alertingKeys[hash] = struct{}{}
// rule disabled in this time span?
if isNoneffective(vectors[i].Timestamp, r.rule) {
continue
}
// handle series tags
tagsMap := make(map[string]string)
for label, value := range vectors[i].Labels {
tagsMap[string(label)] = string(value)
}
// handle rule tags
for _, tag := range r.rule.AppendTagsJSON {
arr := strings.SplitN(tag, "=", 2)
tagsMap[arr[0]] = arr[1]
}
tagsMap["rulename"] = r.rule.Name
// handle target note
targetIdent, has := vectors[i].Labels["ident"]
targetNote := ""
if has {
target, exists := memsto.TargetCache.Get(string(targetIdent))
if exists {
targetNote = target.Note
// 对于包含ident的告警事件check一下ident所属bg和rule所属bg是否相同
// 如果告警规则选择了只在本BG生效那其他BG的机器就不能因此规则产生告警
if r.rule.EnableInBG == 1 && target.GroupId != r.rule.GroupId {
continue
}
}
}
event := &models.AlertCurEvent{
TriggerTime: vectors[i].Timestamp,
TagsMap: tagsMap,
GroupId: r.rule.GroupId,
RuleName: r.rule.Name,
}
bg := memsto.BusiGroupCache.GetByBusiGroupId(r.rule.GroupId)
if bg != nil {
event.GroupName = bg.Name
}
// isMuted only need TriggerTime RuleName and TagsMap
if isMuted(event) {
logger.Infof("event_muted: rule_id=%d %s", r.rule.Id, vectors[i].Key)
continue
}
tagsArr := labelMapToArr(tagsMap)
sort.Strings(tagsArr)
event.Cluster = r.rule.Cluster
event.Hash = hash
event.RuleId = r.rule.Id
event.RuleName = r.rule.Name
event.RuleNote = r.rule.Note
event.RuleProd = r.rule.Prod
event.RuleAlgo = r.rule.Algorithm
event.Severity = r.rule.Severity
event.PromForDuration = r.rule.PromForDuration
event.PromQl = r.rule.PromQl
event.PromEvalInterval = r.rule.PromEvalInterval
event.Callbacks = r.rule.Callbacks
event.CallbacksJSON = r.rule.CallbacksJSON
event.RunbookUrl = r.rule.RunbookUrl
event.NotifyRecovered = r.rule.NotifyRecovered
event.NotifyChannels = r.rule.NotifyChannels
event.NotifyChannelsJSON = r.rule.NotifyChannelsJSON
event.NotifyGroups = r.rule.NotifyGroups
event.NotifyGroupsJSON = r.rule.NotifyGroupsJSON
event.TargetIdent = string(targetIdent)
event.TargetNote = targetNote
event.TriggerValue = readableValue(vectors[i].Value)
event.TagsJSON = tagsArr
event.Tags = strings.Join(tagsArr, ",,")
event.IsRecovered = false
event.LastEvalTime = now
r.handleNewEvent(event)
}
// handle recovered events
r.recoverRule(alertingKeys, now)
}
func readableValue(value float64) string {
ret := fmt.Sprintf("%.5f", value)
ret = strings.TrimRight(ret, "0")
return strings.TrimRight(ret, ".")
}
func labelMapToArr(m map[string]string) []string {
numLabels := len(m)
labelStrings := make([]string, 0, numLabels)
for label, value := range m {
labelStrings = append(labelStrings, fmt.Sprintf("%s=%s", label, value))
}
if numLabels > 1 {
sort.Strings(labelStrings)
}
return labelStrings
}
func (r RuleEval) handleNewEvent(event *models.AlertCurEvent) {
if event.PromForDuration == 0 {
r.fireEvent(event)
return
}
_, has := r.pendings[event.Hash]
if has {
r.pendings[event.Hash].LastEvalTime = event.LastEvalTime
} else {
r.pendings[event.Hash] = event
}
if r.pendings[event.Hash].LastEvalTime-r.pendings[event.Hash].TriggerTime+int64(event.PromEvalInterval) >= int64(event.PromForDuration) {
r.fireEvent(event)
}
}
func (r RuleEval) fireEvent(event *models.AlertCurEvent) {
if fired, has := r.fires[event.Hash]; has {
r.fires[event.Hash].LastEvalTime = event.LastEvalTime
if r.rule.NotifyRepeatStep == 0 {
// 说明不想重复通知那就直接返回了nothing to do
return
}
// 之前发送过告警了,这次是否要继续发送,要看是否过了通道静默时间
if event.LastEvalTime > fired.LastSentTime+int64(r.rule.NotifyRepeatStep)*60 {
if r.rule.NotifyMaxNumber == 0 {
// 最大可以发送次数如果是0表示不想限制最大发送次数一直发即可
event.NotifyCurNumber = fired.NotifyCurNumber + 1
r.pushEventToQueue(event)
} else {
// 有最大发送次数的限制,就要看已经发了几次了,是否达到了最大发送次数
if fired.NotifyCurNumber >= r.rule.NotifyMaxNumber {
return
} else {
event.NotifyCurNumber = fired.NotifyCurNumber + 1
r.pushEventToQueue(event)
}
}
}
} else {
event.NotifyCurNumber = 1
r.pushEventToQueue(event)
}
}
func (r RuleEval) recoverRule(alertingKeys map[string]struct{}, now int64) {
for hash := range r.pendings {
if _, has := alertingKeys[hash]; has {
continue
}
delete(r.pendings, hash)
}
for hash, event := range r.fires {
if _, has := alertingKeys[hash]; has {
continue
}
// 如果配置了留观时长,就不能立马恢复了
if r.rule.RecoverDuration > 0 && now-event.LastEvalTime < r.rule.RecoverDuration {
continue
}
// 没查到触发阈值的vector姑且就认为这个vector的值恢复了
// 我确实无法分辨是prom中有值但是未满足阈值所以没返回还是prom中确实丢了一些点导致没有数据可以返回尴尬
delete(r.fires, hash)
delete(r.pendings, hash)
event.IsRecovered = true
event.LastEvalTime = now
// 可能是因为调整了promql才恢复的所以事件里边要体现最新的promql否则用户会比较困惑
// 当然其实rule的各个字段都可能发生变化了都更新一下吧
event.RuleName = r.rule.Name
event.RuleNote = r.rule.Note
event.RuleProd = r.rule.Prod
event.RuleAlgo = r.rule.Algorithm
event.Severity = r.rule.Severity
event.PromForDuration = r.rule.PromForDuration
event.PromQl = r.rule.PromQl
event.PromEvalInterval = r.rule.PromEvalInterval
event.Callbacks = r.rule.Callbacks
event.CallbacksJSON = r.rule.CallbacksJSON
event.RunbookUrl = r.rule.RunbookUrl
event.NotifyRecovered = r.rule.NotifyRecovered
event.NotifyChannels = r.rule.NotifyChannels
event.NotifyChannelsJSON = r.rule.NotifyChannelsJSON
event.NotifyGroups = r.rule.NotifyGroups
event.NotifyGroupsJSON = r.rule.NotifyGroupsJSON
r.pushEventToQueue(event)
}
}
func (r RuleEval) pushEventToQueue(event *models.AlertCurEvent) {
if !event.IsRecovered {
event.LastSentTime = event.LastEvalTime
r.fires[event.Hash] = event
}
promstat.CounterAlertsTotal.WithLabelValues(config.C.ClusterName).Inc()
LogEvent(event, "push_queue")
if !EventQueue.PushFront(event) {
logger.Warningf("event_push_queue: queue is full")
}
}
func filterRecordingRules() {
ids := memsto.RecordingRuleCache.GetRuleIds()
count := len(ids)
mines := make([]int64, 0, count)
for i := 0; i < count; i++ {
node, err := naming.HashRing.GetNode(fmt.Sprint(ids[i]))
if err != nil {
logger.Warning("failed to get node from hashring:", err)
continue
}
if node == config.C.Heartbeat.Endpoint {
mines = append(mines, ids[i])
}
}
Workers.BuildRe(mines)
}
type RecordingRuleEval struct {
rule *models.RecordingRule
quit chan struct{}
}
func (r RecordingRuleEval) Stop() {
logger.Infof("recording_rule_eval:%d stopping", r.RuleID())
close(r.quit)
}
func (r RecordingRuleEval) RuleID() int64 {
return r.rule.Id
}
func (r RecordingRuleEval) Start() {
logger.Infof("recording_rule_eval:%d started", r.RuleID())
for {
select {
case <-r.quit:
// logger.Infof("rule_eval:%d stopped", r.RuleID())
return
default:
r.Work()
interval := r.rule.PromEvalInterval
if interval <= 0 {
interval = 10
}
time.Sleep(time.Duration(interval) * time.Second)
}
}
}
func (r RecordingRuleEval) Work() {
promql := strings.TrimSpace(r.rule.PromQl)
if promql == "" {
logger.Errorf("recording_rule_eval:%d promql is blank", r.RuleID())
return
}
value, warnings, err := reader.Client.Query(context.Background(), promql, time.Now())
if err != nil {
logger.Errorf("recording_rule_eval:%d promql:%s, error:%v", r.RuleID(), promql, err)
return
}
if len(warnings) > 0 {
logger.Errorf("recording_rule_eval:%d promql:%s, warnings:%v", r.RuleID(), promql, warnings)
return
}
ts := conv.ConvertToTimeSeries(value, r.rule)
if len(ts) != 0 {
for _, v := range ts {
writer.Writers.PushSample(r.rule.Name, v)
}
}
}

View File

@@ -41,15 +41,11 @@ func toRedis() {
return
}
if config.ReaderClients.IsNil(config.C.ClusterName) {
return
}
now := time.Now().Unix()
// clean old idents
for key, at := range items {
if at.(int64) < now-config.C.NoData.Interval {
if at.(int64) < now-10 {
Idents.Remove(key)
} else {
// use now as timestamp to redis
@@ -96,8 +92,7 @@ func loopPushMetrics(ctx context.Context) {
}
func pushMetrics() {
clusterName := config.C.ClusterName
isLeader, err := naming.IamLeader(clusterName)
isLeader, err := naming.IamLeader()
if err != nil {
logger.Errorf("handle_idents: %v", err)
return
@@ -109,7 +104,7 @@ func pushMetrics() {
}
// get all the target heartbeat timestamp
ret, err := storage.Redis.HGetAll(context.Background(), redisKey(clusterName)).Result()
ret, err := storage.Redis.HGetAll(context.Background(), redisKey(config.C.ClusterName)).Result()
if err != nil {
logger.Errorf("handle_idents: redis hgetall fail: %v", err)
return
@@ -126,7 +121,7 @@ func pushMetrics() {
}
if now-clock > dur {
clearDeadIdent(context.Background(), clusterName, ident)
clearDeadIdent(context.Background(), config.C.ClusterName, ident)
} else {
actives[ident] = struct{}{}
}
@@ -158,7 +153,7 @@ func pushMetrics() {
if !has {
// target not exists
target = &models.Target{
Cluster: clusterName,
Cluster: config.C.ClusterName,
Ident: active,
Tags: "",
TagsJSON: []string{},
@@ -179,9 +174,6 @@ func pushMetrics() {
// 把actives传给TargetCache看看除了active的部分还有别的target么有的话返回设置target_up = 0
deads := memsto.TargetCache.GetDeads(actives)
for ident, dead := range deads {
if ident == "" {
continue
}
// build metrics
pt := &prompb.TimeSeries{}
pt.Samples = append(pt.Samples, prompb.Sample{

View File

@@ -27,15 +27,6 @@ var AlertMuteCache = AlertMuteCacheType{
mutes: make(map[int64][]*models.AlertMute),
}
func (amc *AlertMuteCacheType) Reset() {
amc.Lock()
defer amc.Unlock()
amc.statTotal = -1
amc.statLastUpdated = -1
amc.mutes = make(map[int64][]*models.AlertMute)
}
func (amc *AlertMuteCacheType) StatChanged(total, lastUpdated int64) bool {
if amc.statTotal == total && amc.statLastUpdated == lastUpdated {
return false
@@ -99,26 +90,19 @@ func loopSyncAlertMutes() {
func syncAlertMutes() error {
start := time.Now()
clusterNames := config.ReaderClients.GetClusterNames()
if len(clusterNames) == 0 {
AlertRuleCache.Reset()
logger.Warning("cluster is blank")
return nil
}
stat, err := models.AlertMuteStatistics("")
stat, err := models.AlertMuteStatistics(config.C.ClusterName)
if err != nil {
return errors.WithMessage(err, "failed to exec AlertMuteStatistics")
}
if !AlertMuteCache.StatChanged(stat.Total, stat.LastUpdated) {
promstat.GaugeCronDuration.WithLabelValues("sync_alert_mutes").Set(0)
promstat.GaugeSyncNumber.WithLabelValues("sync_alert_mutes").Set(0)
promstat.GaugeCronDuration.WithLabelValues(config.C.ClusterName, "sync_alert_mutes").Set(0)
promstat.GaugeSyncNumber.WithLabelValues(config.C.ClusterName, "sync_alert_mutes").Set(0)
logger.Debug("alert mutes not changed")
return nil
}
lst, err := models.AlertMuteGetsByCluster("")
lst, err := models.AlertMuteGetsByCluster(config.C.ClusterName)
if err != nil {
return errors.WithMessage(err, "failed to exec AlertMuteGetsByCluster")
}
@@ -138,8 +122,8 @@ func syncAlertMutes() error {
AlertMuteCache.Set(oks, stat.Total, stat.LastUpdated)
ms := time.Since(start).Milliseconds()
promstat.GaugeCronDuration.WithLabelValues("sync_alert_mutes").Set(float64(ms))
promstat.GaugeSyncNumber.WithLabelValues("sync_alert_mutes").Set(float64(len(lst)))
promstat.GaugeCronDuration.WithLabelValues(config.C.ClusterName, "sync_alert_mutes").Set(float64(ms))
promstat.GaugeSyncNumber.WithLabelValues(config.C.ClusterName, "sync_alert_mutes").Set(float64(len(lst)))
logger.Infof("timer: sync mutes done, cost: %dms, number: %d", ms, len(lst))
return nil

View File

@@ -27,15 +27,6 @@ var AlertRuleCache = AlertRuleCacheType{
rules: make(map[int64]*models.AlertRule),
}
func (arc *AlertRuleCacheType) Reset() {
arc.Lock()
defer arc.Unlock()
arc.statTotal = -1
arc.statLastUpdated = -1
arc.rules = make(map[int64]*models.AlertRule)
}
func (arc *AlertRuleCacheType) StatChanged(total, lastUpdated int64) bool {
if arc.statTotal == total && arc.statLastUpdated == lastUpdated {
return false
@@ -96,26 +87,19 @@ func loopSyncAlertRules() {
func syncAlertRules() error {
start := time.Now()
clusterNames := config.ReaderClients.GetClusterNames()
if len(clusterNames) == 0 {
AlertRuleCache.Reset()
logger.Warning("cluster is blank")
return nil
}
stat, err := models.AlertRuleStatistics("")
stat, err := models.AlertRuleStatistics(config.C.ClusterName)
if err != nil {
return errors.WithMessage(err, "failed to exec AlertRuleStatistics")
}
if !AlertRuleCache.StatChanged(stat.Total, stat.LastUpdated) {
promstat.GaugeCronDuration.WithLabelValues("sync_alert_rules").Set(0)
promstat.GaugeSyncNumber.WithLabelValues("sync_alert_rules").Set(0)
promstat.GaugeCronDuration.WithLabelValues(config.C.ClusterName, "sync_alert_rules").Set(0)
promstat.GaugeSyncNumber.WithLabelValues(config.C.ClusterName, "sync_alert_rules").Set(0)
logger.Debug("alert rules not changed")
return nil
}
lst, err := models.AlertRuleGetsByCluster("")
lst, err := models.AlertRuleGetsByCluster(config.C.ClusterName)
if err != nil {
return errors.WithMessage(err, "failed to exec AlertRuleGetsByCluster")
}
@@ -128,8 +112,8 @@ func syncAlertRules() error {
AlertRuleCache.Set(m, stat.Total, stat.LastUpdated)
ms := time.Since(start).Milliseconds()
promstat.GaugeCronDuration.WithLabelValues("sync_alert_rules").Set(float64(ms))
promstat.GaugeSyncNumber.WithLabelValues("sync_alert_rules").Set(float64(len(m)))
promstat.GaugeCronDuration.WithLabelValues(config.C.ClusterName, "sync_alert_rules").Set(float64(ms))
promstat.GaugeSyncNumber.WithLabelValues(config.C.ClusterName, "sync_alert_rules").Set(float64(len(m)))
logger.Infof("timer: sync rules done, cost: %dms, number: %d", ms, len(m))
return nil

View File

@@ -27,15 +27,6 @@ var AlertSubscribeCache = AlertSubscribeCacheType{
subs: make(map[int64][]*models.AlertSubscribe),
}
func (c *AlertSubscribeCacheType) Reset() {
c.Lock()
defer c.Unlock()
c.statTotal = -1
c.statLastUpdated = -1
c.subs = make(map[int64][]*models.AlertSubscribe)
}
func (c *AlertSubscribeCacheType) StatChanged(total, lastUpdated int64) bool {
if c.statTotal == total && c.statLastUpdated == lastUpdated {
return false
@@ -102,26 +93,19 @@ func loopSyncAlertSubscribes() {
func syncAlertSubscribes() error {
start := time.Now()
clusterNames := config.ReaderClients.GetClusterNames()
if len(clusterNames) == 0 {
AlertSubscribeCache.Reset()
logger.Warning("cluster is blank")
return nil
}
stat, err := models.AlertSubscribeStatistics("")
stat, err := models.AlertSubscribeStatistics(config.C.ClusterName)
if err != nil {
return errors.WithMessage(err, "failed to exec AlertSubscribeStatistics")
}
if !AlertSubscribeCache.StatChanged(stat.Total, stat.LastUpdated) {
promstat.GaugeCronDuration.WithLabelValues("sync_alert_subscribes").Set(0)
promstat.GaugeSyncNumber.WithLabelValues("sync_alert_subscribes").Set(0)
promstat.GaugeCronDuration.WithLabelValues(config.C.ClusterName, "sync_alert_subscribes").Set(0)
promstat.GaugeSyncNumber.WithLabelValues(config.C.ClusterName, "sync_alert_subscribes").Set(0)
logger.Debug("alert subscribes not changed")
return nil
}
lst, err := models.AlertSubscribeGetsByCluster("")
lst, err := models.AlertSubscribeGetsByCluster(config.C.ClusterName)
if err != nil {
return errors.WithMessage(err, "failed to exec AlertSubscribeGetsByCluster")
}
@@ -141,8 +125,8 @@ func syncAlertSubscribes() error {
AlertSubscribeCache.Set(subs, stat.Total, stat.LastUpdated)
ms := time.Since(start).Milliseconds()
promstat.GaugeCronDuration.WithLabelValues("sync_alert_subscribes").Set(float64(ms))
promstat.GaugeSyncNumber.WithLabelValues("sync_alert_subscribes").Set(float64(len(lst)))
promstat.GaugeCronDuration.WithLabelValues(config.C.ClusterName, "sync_alert_subscribes").Set(float64(ms))
promstat.GaugeSyncNumber.WithLabelValues(config.C.ClusterName, "sync_alert_subscribes").Set(float64(len(lst)))
logger.Infof("timer: sync subscribes done, cost: %dms, number: %d", ms, len(lst))
return nil

View File

@@ -9,6 +9,7 @@ import (
"github.com/toolkits/pkg/logger"
"github.com/didi/nightingale/v5/src/models"
"github.com/didi/nightingale/v5/src/server/config"
promstat "github.com/didi/nightingale/v5/src/server/stat"
)
@@ -79,9 +80,8 @@ func syncBusiGroups() error {
}
if !BusiGroupCache.StatChanged(stat.Total, stat.LastUpdated) {
promstat.GaugeCronDuration.WithLabelValues("sync_busi_groups").Set(0)
promstat.GaugeSyncNumber.WithLabelValues("sync_busi_groups").Set(0)
promstat.GaugeCronDuration.WithLabelValues(config.C.ClusterName, "sync_busi_groups").Set(0)
promstat.GaugeSyncNumber.WithLabelValues(config.C.ClusterName, "sync_busi_groups").Set(0)
logger.Debug("busi_group not changed")
return nil
}
@@ -94,9 +94,8 @@ func syncBusiGroups() error {
BusiGroupCache.Set(m, stat.Total, stat.LastUpdated)
ms := time.Since(start).Milliseconds()
promstat.GaugeCronDuration.WithLabelValues("sync_busi_groups").Set(float64(ms))
promstat.GaugeSyncNumber.WithLabelValues("sync_busi_groups").Set(float64(len(m)))
promstat.GaugeCronDuration.WithLabelValues(config.C.ClusterName, "sync_busi_groups").Set(float64(ms))
promstat.GaugeSyncNumber.WithLabelValues(config.C.ClusterName, "sync_busi_groups").Set(float64(len(m)))
logger.Infof("timer: sync busi groups done, cost: %dms, number: %d", ms, len(m))
return nil

View File

@@ -1,44 +0,0 @@
package memsto
import (
"sync"
)
type LogSampleCacheType struct {
sync.RWMutex
m map[string]map[string]struct{} // map[labelName]map[labelValue]struct{}
}
var LogSampleCache = LogSampleCacheType{
m: make(map[string]map[string]struct{}),
}
func (l *LogSampleCacheType) Set(m map[string][]string) {
l.Lock()
for k, v := range m {
l.m[k] = make(map[string]struct{})
for _, vv := range v {
l.m[k][vv] = struct{}{}
}
}
l.Unlock()
}
func (l *LogSampleCacheType) Get() map[string]map[string]struct{} {
l.RLock()
defer l.RUnlock()
return l.m
}
func (l *LogSampleCacheType) Clean() {
l.Lock()
l.m = make(map[string]map[string]struct{})
l.Unlock()
}
func (l *LogSampleCacheType) Len() int {
l.RLock()
defer l.RUnlock()
return len(l.m)
}

View File

@@ -26,15 +26,6 @@ var RecordingRuleCache = RecordingRuleCacheType{
rules: make(map[int64]*models.RecordingRule),
}
func (rrc *RecordingRuleCacheType) Reset() {
rrc.Lock()
defer rrc.Unlock()
rrc.statTotal = -1
rrc.statLastUpdated = -1
rrc.rules = make(map[int64]*models.RecordingRule)
}
func (rrc *RecordingRuleCacheType) StatChanged(total, lastUpdated int64) bool {
if rrc.statTotal == total && rrc.statLastUpdated == lastUpdated {
return false
@@ -95,32 +86,19 @@ func loopSyncRecordingRules() {
func syncRecordingRules() error {
start := time.Now()
clusterNames := config.ReaderClients.GetClusterNames()
if len(clusterNames) == 0 {
RecordingRuleCache.Reset()
logger.Warning("cluster is blank")
return nil
}
var clusterName string
// 只有一个集群使用单集群模式如果大于1个集群则获取全部的规则
if len(clusterNames) == 1 {
clusterName = clusterNames[0]
}
stat, err := models.RecordingRuleStatistics(clusterName)
stat, err := models.RecordingRuleStatistics(config.C.ClusterName)
if err != nil {
return errors.WithMessage(err, "failed to exec RecordingRuleStatistics")
}
if !RecordingRuleCache.StatChanged(stat.Total, stat.LastUpdated) {
promstat.GaugeCronDuration.WithLabelValues("sync_recording_rules").Set(0)
promstat.GaugeSyncNumber.WithLabelValues("sync_recording_rules").Set(0)
promstat.GaugeCronDuration.WithLabelValues(config.C.ClusterName, "sync_recording_rules").Set(0)
promstat.GaugeSyncNumber.WithLabelValues(config.C.ClusterName, "sync_recording_rules").Set(0)
logger.Debug("recoding rules not changed")
return nil
}
lst, err := models.RecordingRuleGetsByCluster(clusterName)
lst, err := models.RecordingRuleGetsByCluster(config.C.ClusterName)
if err != nil {
return errors.WithMessage(err, "failed to exec RecordingRuleGetsByCluster")
}
@@ -133,8 +111,8 @@ func syncRecordingRules() error {
RecordingRuleCache.Set(m, stat.Total, stat.LastUpdated)
ms := time.Since(start).Milliseconds()
promstat.GaugeCronDuration.WithLabelValues("sync_recording_rules").Set(float64(ms))
promstat.GaugeSyncNumber.WithLabelValues("sync_recording_rules").Set(float64(len(m)))
promstat.GaugeCronDuration.WithLabelValues(config.C.ClusterName, "sync_recording_rules").Set(float64(ms))
promstat.GaugeSyncNumber.WithLabelValues(config.C.ClusterName, "sync_recording_rules").Set(float64(len(m)))
logger.Infof("timer: sync recording rules done, cost: %dms, number: %d", ms, len(m))
return nil

View File

@@ -31,15 +31,6 @@ var TargetCache = TargetCacheType{
targets: make(map[string]*models.Target),
}
func (tc *TargetCacheType) Reset() {
tc.Lock()
defer tc.Unlock()
tc.statTotal = -1
tc.statLastUpdated = -1
tc.targets = make(map[string]*models.Target)
}
func (tc *TargetCacheType) StatChanged(total, lastUpdated int64) bool {
if tc.statTotal == total && tc.statLastUpdated == lastUpdated {
return false
@@ -103,26 +94,19 @@ func loopSyncTargets() {
func syncTargets() error {
start := time.Now()
clusterName := config.C.ClusterName
if clusterName == "" {
TargetCache.Reset()
logger.Warning("cluster name is blank")
return nil
}
stat, err := models.TargetStatistics(clusterName)
stat, err := models.TargetStatistics(config.C.ClusterName)
if err != nil {
return errors.WithMessage(err, "failed to exec TargetStatistics")
}
if !TargetCache.StatChanged(stat.Total, stat.LastUpdated) {
promstat.GaugeCronDuration.WithLabelValues("sync_targets").Set(0)
promstat.GaugeSyncNumber.WithLabelValues("sync_targets").Set(0)
promstat.GaugeCronDuration.WithLabelValues(config.C.ClusterName, "sync_targets").Set(0)
promstat.GaugeSyncNumber.WithLabelValues(config.C.ClusterName, "sync_targets").Set(0)
logger.Debug("targets not changed")
return nil
}
lst, err := models.TargetGetsByCluster(clusterName)
lst, err := models.TargetGetsByCluster(config.C.ClusterName)
if err != nil {
return errors.WithMessage(err, "failed to exec TargetGetsByCluster")
}
@@ -145,8 +129,8 @@ func syncTargets() error {
TargetCache.Set(m, stat.Total, stat.LastUpdated)
ms := time.Since(start).Milliseconds()
promstat.GaugeCronDuration.WithLabelValues("sync_targets").Set(float64(ms))
promstat.GaugeSyncNumber.WithLabelValues("sync_targets").Set(float64(len(lst)))
promstat.GaugeCronDuration.WithLabelValues(config.C.ClusterName, "sync_targets").Set(float64(ms))
promstat.GaugeSyncNumber.WithLabelValues(config.C.ClusterName, "sync_targets").Set(float64(len(lst)))
logger.Infof("timer: sync targets done, cost: %dms, number: %d", ms, len(lst))
return nil

View File

@@ -9,6 +9,7 @@ import (
"github.com/toolkits/pkg/logger"
"github.com/didi/nightingale/v5/src/models"
"github.com/didi/nightingale/v5/src/server/config"
promstat "github.com/didi/nightingale/v5/src/server/stat"
)
@@ -124,9 +125,8 @@ func syncUsers() error {
}
if !UserCache.StatChanged(stat.Total, stat.LastUpdated) {
promstat.GaugeCronDuration.WithLabelValues("sync_users").Set(0)
promstat.GaugeSyncNumber.WithLabelValues("sync_users").Set(0)
promstat.GaugeCronDuration.WithLabelValues(config.C.ClusterName, "sync_users").Set(0)
promstat.GaugeSyncNumber.WithLabelValues(config.C.ClusterName, "sync_users").Set(0)
logger.Debug("users not changed")
return nil
}
@@ -144,9 +144,8 @@ func syncUsers() error {
UserCache.Set(m, stat.Total, stat.LastUpdated)
ms := time.Since(start).Milliseconds()
promstat.GaugeCronDuration.WithLabelValues("sync_users").Set(float64(ms))
promstat.GaugeSyncNumber.WithLabelValues("sync_users").Set(float64(len(m)))
promstat.GaugeCronDuration.WithLabelValues(config.C.ClusterName, "sync_users").Set(float64(ms))
promstat.GaugeSyncNumber.WithLabelValues(config.C.ClusterName, "sync_users").Set(float64(len(m)))
logger.Infof("timer: sync users done, cost: %dms, number: %d", ms, len(m))
return nil

View File

@@ -9,6 +9,7 @@ import (
"github.com/toolkits/pkg/logger"
"github.com/didi/nightingale/v5/src/models"
"github.com/didi/nightingale/v5/src/server/config"
promstat "github.com/didi/nightingale/v5/src/server/stat"
)
@@ -106,9 +107,8 @@ func syncUserGroups() error {
}
if !UserGroupCache.StatChanged(stat.Total, stat.LastUpdated) {
promstat.GaugeCronDuration.WithLabelValues("sync_user_groups").Set(0)
promstat.GaugeSyncNumber.WithLabelValues("sync_user_groups").Set(0)
promstat.GaugeCronDuration.WithLabelValues(config.C.ClusterName, "sync_user_groups").Set(0)
promstat.GaugeSyncNumber.WithLabelValues(config.C.ClusterName, "sync_user_groups").Set(0)
logger.Debug("user_group not changed")
return nil
}
@@ -145,9 +145,8 @@ func syncUserGroups() error {
UserGroupCache.Set(m, stat.Total, stat.LastUpdated)
ms := time.Since(start).Milliseconds()
promstat.GaugeCronDuration.WithLabelValues("sync_user_groups").Set(float64(ms))
promstat.GaugeSyncNumber.WithLabelValues("sync_user_groups").Set(float64(len(m)))
promstat.GaugeCronDuration.WithLabelValues(config.C.ClusterName, "sync_user_groups").Set(float64(ms))
promstat.GaugeSyncNumber.WithLabelValues(config.C.ClusterName, "sync_user_groups").Set(float64(len(m)))
logger.Infof("timer: sync user groups done, cost: %dms, number: %d", ms, len(m))
return nil

View File

@@ -9,56 +9,51 @@ import (
const NodeReplicas = 500
type ClusterHashRingType struct {
type ConsistentHashRing struct {
sync.RWMutex
Rings map[string]*consistent.Consistent
ring *consistent.Consistent
}
// for alert_rule sharding
var ClusterHashRing = ClusterHashRingType{Rings: make(map[string]*consistent.Consistent)}
var HashRing = NewConsistentHashRing(int32(NodeReplicas), []string{})
func NewConsistentHashRing(replicas int32, nodes []string) *consistent.Consistent {
ret := consistent.New()
ret.NumberOfReplicas = int(replicas)
func (chr *ConsistentHashRing) GetNode(pk string) (string, error) {
chr.RLock()
defer chr.RUnlock()
return chr.ring.Get(pk)
}
func (chr *ConsistentHashRing) Set(r *consistent.Consistent) {
chr.Lock()
defer chr.Unlock()
chr.ring = r
}
func (chr *ConsistentHashRing) GetRing() *consistent.Consistent {
chr.RLock()
defer chr.RUnlock()
return chr.ring
}
func NewConsistentHashRing(replicas int32, nodes []string) *ConsistentHashRing {
ret := &ConsistentHashRing{ring: consistent.New()}
ret.ring.NumberOfReplicas = int(replicas)
for i := 0; i < len(nodes); i++ {
ret.Add(nodes[i])
ret.ring.Add(nodes[i])
}
return ret
}
func RebuildConsistentHashRing(cluster string, nodes []string) {
func RebuildConsistentHashRing(nodes []string) {
r := consistent.New()
r.NumberOfReplicas = NodeReplicas
for i := 0; i < len(nodes); i++ {
r.Add(nodes[i])
}
ClusterHashRing.Set(cluster, r)
logger.Infof("hash ring %s rebuild %+v", cluster, r.Members())
}
HashRing.Set(r)
func (chr *ClusterHashRingType) GetNode(cluster, pk string) (string, error) {
chr.RLock()
defer chr.RUnlock()
_, exists := chr.Rings[cluster]
if !exists {
chr.Rings[cluster] = NewConsistentHashRing(int32(NodeReplicas), []string{})
}
return chr.Rings[cluster].Get(pk)
}
func (chr *ClusterHashRingType) IsHit(cluster string, pk string, currentNode string) bool {
node, err := chr.GetNode(cluster, pk)
if err != nil {
logger.Debugf("cluster:%s pk:%s failed to get node from hashring:%v", cluster, pk, err)
return false
}
return node == currentNode
}
func (chr *ClusterHashRingType) Set(cluster string, r *consistent.Consistent) {
chr.RLock()
defer chr.RUnlock()
chr.Rings[cluster] = r
logger.Infof("hash ring rebuild %+v", r.Members())
}

View File

@@ -4,102 +4,102 @@ import (
"context"
"fmt"
"sort"
"strconv"
"strings"
"time"
"github.com/toolkits/pkg/logger"
"github.com/didi/nightingale/v5/src/models"
"github.com/didi/nightingale/v5/src/server/config"
"github.com/didi/nightingale/v5/src/storage"
)
// local servers
var localss map[string]string
var localss string
func Heartbeat(ctx context.Context) error {
localss = make(map[string]string)
if err := heartbeat(); err != nil {
if err := heartbeat(ctx); err != nil {
fmt.Println("failed to heartbeat:", err)
return err
}
go loopHeartbeat()
go loopHeartbeat(ctx)
return nil
}
func loopHeartbeat() {
func loopHeartbeat(ctx context.Context) {
interval := time.Duration(config.C.Heartbeat.Interval) * time.Millisecond
for {
time.Sleep(interval)
if err := heartbeat(); err != nil {
if err := heartbeat(ctx); err != nil {
logger.Warning(err)
}
}
}
func heartbeat() error {
var clusters []string
var err error
if config.C.ReaderFrom == "config" {
// 在配置文件维护实例和集群的对应关系
for i := 0; i < len(config.C.Readers); i++ {
clusters = append(clusters, config.C.Readers[i].ClusterName)
err := models.AlertingEngineHeartbeatWithCluster(config.C.Heartbeat.Endpoint, config.C.Readers[i].ClusterName)
if err != nil {
logger.Warningf("heartbeat with cluster %s err:%v", config.C.Readers[i].ClusterName, err)
continue
}
}
} else {
// 在页面上维护实例和集群的对应关系
clusters, err = models.AlertingEngineGetClusters(config.C.Heartbeat.Endpoint)
if err != nil {
return err
}
if len(clusters) == 0 {
// 实例刚刚部署,还没有在页面配置 cluster 的情况,先使用配置文件中的 cluster 上报心跳
for i := 0; i < len(config.C.Readers); i++ {
err := models.AlertingEngineHeartbeatWithCluster(config.C.Heartbeat.Endpoint, config.C.Readers[i].ClusterName)
if err != nil {
logger.Warningf("heartbeat with cluster %s err:%v", config.C.Readers[i].ClusterName, err)
continue
}
}
}
// hash struct:
// /server/heartbeat/Default -> {
// 10.2.3.4:19000 => $timestamp
// 10.2.3.5:19000 => $timestamp
// }
func redisKey(cluster string) string {
return fmt.Sprintf("/server/heartbeat/%s", cluster)
}
err := models.AlertingEngineHeartbeat(config.C.Heartbeat.Endpoint)
if err != nil {
return err
}
func heartbeat(ctx context.Context) error {
now := time.Now().Unix()
key := redisKey(config.C.ClusterName)
err := storage.Redis.HSet(ctx, key, config.C.Heartbeat.Endpoint, now).Err()
if err != nil {
return err
}
for i := 0; i < len(clusters); i++ {
servers, err := ActiveServers(clusters[i])
if err != nil {
logger.Warningf("hearbeat %s get active server err:", clusters[i], err)
continue
}
servers, err := ActiveServers(ctx, config.C.ClusterName)
if err != nil {
return err
}
sort.Strings(servers)
newss := strings.Join(servers, " ")
oldss, exists := localss[clusters[i]]
if exists && oldss == newss {
continue
}
RebuildConsistentHashRing(clusters[i], servers)
localss[clusters[i]] = newss
sort.Strings(servers)
newss := strings.Join(servers, " ")
if newss != localss {
RebuildConsistentHashRing(servers)
localss = newss
}
return nil
}
func ActiveServers(cluster string) ([]string, error) {
if cluster == "" {
return nil, fmt.Errorf("cluster is empty")
func clearDeadServer(ctx context.Context, cluster, endpoint string) {
key := redisKey(cluster)
err := storage.Redis.HDel(ctx, key, endpoint).Err()
if err != nil {
logger.Warningf("failed to hdel %s %s, error: %v", key, endpoint, err)
}
}
func ActiveServers(ctx context.Context, cluster string) ([]string, error) {
ret, err := storage.Redis.HGetAll(ctx, redisKey(cluster)).Result()
if err != nil {
return nil, err
}
// 30秒内有心跳就认为是活的
return models.AlertingEngineGetsInstances("cluster = ? and clock > ?", cluster, time.Now().Unix()-30)
now := time.Now().Unix()
dur := int64(20)
actives := make([]string, 0, len(ret))
for endpoint, clockstr := range ret {
clock, err := strconv.ParseInt(clockstr, 10, 64)
if err != nil {
continue
}
if now-clock > dur {
clearDeadServer(ctx, cluster, endpoint)
continue
}
actives = append(actives, endpoint)
}
return actives, nil
}

View File

@@ -1,14 +1,15 @@
package naming
import (
"context"
"sort"
"github.com/didi/nightingale/v5/src/server/config"
"github.com/toolkits/pkg/logger"
)
func IamLeader(cluster string) (bool, error) {
servers, err := ActiveServers(cluster)
func IamLeader() (bool, error) {
servers, err := ActiveServers(context.Background(), config.C.ClusterName)
if err != nil {
logger.Errorf("failed to get active servers: %v", err)
return false, err

View File

@@ -0,0 +1,46 @@
package reader
import (
"net"
"net/http"
"time"
"github.com/didi/nightingale/v5/src/pkg/prom"
"github.com/didi/nightingale/v5/src/server/config"
"github.com/prometheus/client_golang/api"
)
var Client prom.API
func Init(opts config.ReaderOptions) error {
cli, err := api.NewClient(api.Config{
Address: opts.Url,
RoundTripper: &http.Transport{
// TLSClientConfig: tlsConfig,
Proxy: http.ProxyFromEnvironment,
DialContext: (&net.Dialer{
Timeout: time.Duration(opts.DialTimeout) * time.Millisecond,
KeepAlive: time.Duration(opts.KeepAlive) * time.Millisecond,
}).DialContext,
ResponseHeaderTimeout: time.Duration(opts.Timeout) * time.Millisecond,
TLSHandshakeTimeout: time.Duration(opts.TLSHandshakeTimeout) * time.Millisecond,
ExpectContinueTimeout: time.Duration(opts.ExpectContinueTimeout) * time.Millisecond,
MaxConnsPerHost: opts.MaxConnsPerHost,
MaxIdleConns: opts.MaxIdleConns,
MaxIdleConnsPerHost: opts.MaxIdleConnsPerHost,
IdleConnTimeout: time.Duration(opts.IdleConnTimeout) * time.Millisecond,
},
})
if err != nil {
return err
}
Client = prom.NewAPI(cli, prom.ClientOptions{
BasicAuthUser: opts.BasicAuthUser,
BasicAuthPass: opts.BasicAuthPass,
Headers: opts.Headers,
})
return nil
}

View File

@@ -14,10 +14,11 @@ import (
"github.com/didi/nightingale/v5/src/pkg/aop"
"github.com/didi/nightingale/v5/src/server/config"
"github.com/didi/nightingale/v5/src/server/naming"
promstat "github.com/didi/nightingale/v5/src/server/stat"
)
func New(version string, reloadFunc func()) *gin.Engine {
func New(version string) *gin.Engine {
gin.SetMode(config.C.RunMode)
loggerMid := aop.Logger()
@@ -36,12 +37,12 @@ func New(version string, reloadFunc func()) *gin.Engine {
r.Use(loggerMid)
}
configRoute(r, version, reloadFunc)
configRoute(r, version)
return r
}
func configRoute(r *gin.Engine, version string, reloadFunc func()) {
func configRoute(r *gin.Engine, version string) {
if config.C.HTTP.PProf {
pprof.Register(r, "/api/debug/pprof")
}
@@ -62,13 +63,8 @@ func configRoute(r *gin.Engine, version string, reloadFunc func()) {
c.String(200, version)
})
r.POST("/-/reload", func(c *gin.Context) {
reloadFunc()
c.String(200, "reload success")
})
r.GET("/servers/active", func(c *gin.Context) {
lst, err := naming.ActiveServers(ginx.QueryStr(c, "cluster"))
lst, err := naming.ActiveServers(c.Request.Context(), config.C.ClusterName)
ginx.NewRender(c).Data(lst, err)
})
@@ -100,14 +96,8 @@ func configRoute(r *gin.Engine, version string, reloadFunc func()) {
r.GET("/metrics", gin.WrapH(promhttp.Handler()))
r.GET("/log-sample-filter", logSampleFilterGet)
r.POST("/log-sample-filter", logSampleFilterAdd)
r.DELETE("/log-sample-filter", logSampleFilterDel)
service := r.Group("/v1/n9e")
service.POST("/event", pushEventToQueue)
service.POST("/make-event", makeEvent)
service.POST("/judge-event", judgeEvent)
}
func stat() gin.HandlerFunc {

View File

@@ -3,6 +3,7 @@ package router
import (
"compress/gzip"
"compress/zlib"
"encoding/json"
"fmt"
"io/ioutil"
"net/http"
@@ -16,17 +17,14 @@ import (
promstat "github.com/didi/nightingale/v5/src/server/stat"
"github.com/didi/nightingale/v5/src/server/writer"
"github.com/gin-gonic/gin"
"github.com/mailru/easyjson"
"github.com/prometheus/common/model"
"github.com/prometheus/prometheus/prompb"
)
//easyjson:json
type TimeSeries struct {
Series []*DatadogMetric `json:"series"`
}
//easyjson:json
type DatadogMetric struct {
Metric string `json:"metric"`
Points []DatadogPoint `json:"points"`
@@ -34,7 +32,6 @@ type DatadogMetric struct {
Tags []string `json:"tags,omitempty"`
}
//easyjson:json
type DatadogPoint [2]float64
func (m *DatadogMetric) Clean() error {
@@ -217,7 +214,7 @@ func datadogSeries(c *gin.Context) {
}
var series TimeSeries
err = easyjson.Unmarshal(bs, &series)
err = json.Unmarshal(bs, &series)
if err != nil {
c.String(400, err.Error())
return
@@ -266,25 +263,13 @@ func datadogSeries(c *gin.Context) {
}
}
LogSample(c.Request.RemoteAddr, pt)
if config.C.WriterOpt.ShardingKey == "ident" {
if ident == "" {
writer.Writers.PushSample("-", pt)
} else {
writer.Writers.PushSample(ident, pt)
}
} else {
writer.Writers.PushSample(item.Metric, pt)
}
writer.Writers.PushSample(item.Metric, pt)
succ++
}
if succ > 0 {
cn := config.C.ClusterName
if cn != "" {
promstat.CounterSampleTotal.WithLabelValues(cn, "datadog").Add(float64(succ))
}
promstat.CounterSampleTotal.WithLabelValues(config.C.ClusterName, "datadog").Add(float64(succ))
idents.Idents.MSet(ids)
}

Some files were not shown because too many files have changed in this diff Show More