Compare commits
94 Commits
dashboard_
...
refactor_h
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
7094665c25 | ||
|
|
f1a5c2065c | ||
|
|
6b9ceda9c1 | ||
|
|
7390d42e62 | ||
|
|
a35f879dc0 | ||
|
|
3fd4ea4853 | ||
|
|
20f0a9d16d | ||
|
|
5d4151983a | ||
|
|
83b5f12474 | ||
|
|
8c7bfb4f4a | ||
|
|
4ccf887920 | ||
|
|
546d9cb2cc | ||
|
|
391b42a399 | ||
|
|
a916a0fc6b | ||
|
|
da9f5fbb12 | ||
|
|
ad3cf58bf3 | ||
|
|
a77dc15e36 | ||
|
|
9ad51aeeff | ||
|
|
2c7f030ea5 | ||
|
|
039be7fc6c | ||
|
|
9bff2509a8 | ||
|
|
35b3cbb697 | ||
|
|
d81275b9c8 | ||
|
|
e29dd58823 | ||
|
|
b64aa03ccf | ||
|
|
3893cb00a5 | ||
|
|
4b6985c8af | ||
|
|
7cc9470823 | ||
|
|
b97dfce0ad | ||
|
|
357d3dff78 | ||
|
|
d0604f0c97 | ||
|
|
8fafa0075b | ||
|
|
caa23fbba1 | ||
|
|
4b9fea3cb2 | ||
|
|
f61a04f43f | ||
|
|
ef3588ff46 | ||
|
|
3e3210bb81 | ||
|
|
da7ef5a92e | ||
|
|
82b91164fe | ||
|
|
033d45309f | ||
|
|
60e9fb21f1 | ||
|
|
508006ad01 | ||
|
|
97d7b0574a | ||
|
|
c44aebd404 | ||
|
|
2afa921a5d | ||
|
|
313c820f1f | ||
|
|
02f0b4579b | ||
|
|
36eb308ef6 | ||
|
|
cd2db571cf | ||
|
|
a0cf12b171 | ||
|
|
8358ab4b81 | ||
|
|
0fc6cb8ef2 | ||
|
|
e1ab013c45 | ||
|
|
d984ad8bf4 | ||
|
|
86fe3c7c43 | ||
|
|
0f4478318e | ||
|
|
c0d0eb0e69 | ||
|
|
b62762b2e6 | ||
|
|
810ca0e469 | ||
|
|
33e3b224b9 | ||
|
|
24d7b2b1bf | ||
|
|
1d5ff1b28d | ||
|
|
ed5c8c5758 | ||
|
|
01f7860900 | ||
|
|
a6bb03c8ba | ||
|
|
e9150b2ae0 | ||
|
|
30d1ebd808 | ||
|
|
2f69d92055 | ||
|
|
deeb40b4a0 | ||
|
|
37f68fd52b | ||
|
|
73828e50b5 | ||
|
|
7e73850117 | ||
|
|
3a075e7681 | ||
|
|
4ec5612d78 | ||
|
|
817ed0ab1b | ||
|
|
63aa615761 | ||
|
|
2a36902760 | ||
|
|
bca9331182 | ||
|
|
199a23e385 | ||
|
|
c733f16cc7 | ||
|
|
81585649aa | ||
|
|
2c4422d657 | ||
|
|
aaf66cb386 | ||
|
|
cfed4d8318 | ||
|
|
606cd538ec | ||
|
|
bafb3b2546 | ||
|
|
9a0224697f | ||
|
|
23156552db | ||
|
|
36bca795fa | ||
|
|
b5503ae93e | ||
|
|
3c102e47ed | ||
|
|
60bf8139b1 | ||
|
|
fc0d077c9f | ||
|
|
3a610f7ea0 |
3
.gitignore
vendored
@@ -41,7 +41,8 @@ _test
|
||||
/docker/pub
|
||||
/docker/n9e
|
||||
/docker/mysqldata
|
||||
/etc.local
|
||||
/docker/experience_pg_vm/pgdata
|
||||
/etc.local*
|
||||
|
||||
.alerts
|
||||
.idea
|
||||
|
||||
@@ -2,6 +2,7 @@ before:
|
||||
hooks:
|
||||
# You may remove this if you don't use go modules.
|
||||
- go mod tidy
|
||||
- go install github.com/rakyll/statik
|
||||
|
||||
snapshot:
|
||||
name_template: '{{ .Tag }}'
|
||||
@@ -115,7 +116,7 @@ dockers:
|
||||
goarch: arm64
|
||||
ids:
|
||||
- build
|
||||
dockerfile: docker/Dockerfile.goreleaser
|
||||
dockerfile: docker/Dockerfile.goreleaser.arm64
|
||||
extra_files:
|
||||
- pub
|
||||
- etc
|
||||
|
||||
9
Makefile
@@ -1,4 +1,4 @@
|
||||
.PHONY: start build
|
||||
.PHONY: prebuild start build
|
||||
|
||||
ROOT:=$(shell pwd -P)
|
||||
GIT_COMMIT:=$(shell git --work-tree ${ROOT} rev-parse 'HEAD^{commit}')
|
||||
@@ -6,6 +6,11 @@ _GIT_VERSION:=$(shell git --work-tree ${ROOT} describe --tags --abbrev=14 "${GIT
|
||||
TAG=$(shell echo "${_GIT_VERSION}" | awk -F"-" '{print $$1}')
|
||||
RELEASE_VERSION:="$(TAG)-$(GIT_COMMIT)"
|
||||
|
||||
prebuild:
|
||||
echo "begin download and embed the front-end file..."
|
||||
sh fe.sh
|
||||
echo "front-end file download and embedding completed."
|
||||
|
||||
all: build
|
||||
|
||||
build:
|
||||
@@ -17,7 +22,7 @@ build-alert:
|
||||
build-pushgw:
|
||||
go build -ldflags "-w -s -X github.com/ccfos/nightingale/v6/pkg/version.Version=$(RELEASE_VERSION)" -o n9e-pushgw ./cmd/pushgw/main.go
|
||||
|
||||
build-cli:
|
||||
build-cli:
|
||||
go build -ldflags "-w -s -X github.com/ccfos/nightingale/v6/pkg/version.Version=$(RELEASE_VERSION)" -o n9e-cli ./cmd/cli/main.go
|
||||
|
||||
run:
|
||||
|
||||
114
README.md
@@ -20,117 +20,36 @@
|
||||
<img alt="License" src="https://img.shields.io/badge/license-Apache--2.0-blue"/>
|
||||
</p>
|
||||
<p align="center">
|
||||
<b>All-in-one</b> 的开源观测平台 <br/>
|
||||
<b>开箱即用</b>,集数据采集、可视化、监控告警于一体 <br/>
|
||||
推荐升级您的 <b>Prometheus + AlertManager + Grafana + ELK + Jaeger</b> 组合方案到夜莺!
|
||||
告警管理专家,一体化开源观测平台!
|
||||
</p>
|
||||
|
||||
[English](./README_en.md) | [中文](./README.md)
|
||||
|
||||
## 资料
|
||||
|
||||
- 文档:[https://flashcat.cloud/docs/](https://flashcat.cloud/docs/)
|
||||
- 论坛提问:[https://answer.flashcat.cloud/](https://answer.flashcat.cloud/)
|
||||
- 报Bug:[https://github.com/ccfos/nightingale/issues](https://github.com/ccfos/nightingale/issues/new?assignees=&labels=kind%2Fbug&projects=&template=bug_report.yml)
|
||||
- 商业版本:[企业版](https://mp.weixin.qq.com/s/FOwnnGPkRao2ZDV574EHrw) | [专业版](https://mp.weixin.qq.com/s/uM2a8QUDJEYwdBpjkbQDxA) 感兴趣请 [联系我们交流试用](https://flashcat.cloud/contact/)
|
||||
|
||||
## 功能和特点
|
||||
|
||||
- **开箱即用**
|
||||
- 支持 Docker、Helm Chart、云服务等多种部署方式,集数据采集、监控告警、可视化为一体,内置多种监控仪表盘、快捷视图、告警规则模板,导入即可快速使用,**大幅降低云原生监控系统的建设成本、学习成本、使用成本**;
|
||||
- **专业告警**
|
||||
- 可视化的告警配置和管理,支持丰富的告警规则,提供屏蔽规则、订阅规则的配置能力,支持告警多种送达渠道,支持告警自愈、告警事件管理等;
|
||||
- **推荐您使用夜莺的同时,无缝搭配[FlashDuty](https://flashcat.cloud/product/flashcat-duty/),实现告警聚合收敛、认领、升级、排班、协同,让告警的触达既高效,又确保告警处理不遗漏、做到件件有回响**。
|
||||
- **云原生**
|
||||
- 以交钥匙的方式快速构建企业级的云原生监控体系,支持 [Categraf](https://github.com/flashcatcloud/categraf)、Telegraf、Grafana-agent 等多种采集器,支持 Prometheus、VictoriaMetrics、M3DB、ElasticSearch、Jaeger 等多种数据源,兼容支持导入 Grafana 仪表盘,**与云原生生态无缝集成**;
|
||||
- **高性能 高可用**
|
||||
- 得益于夜莺的多数据源管理引擎,和夜莺引擎侧优秀的架构设计,借助于高性能时序库,可以满足数亿时间线的采集、存储、告警分析场景,节省大量成本;
|
||||
- 夜莺监控组件均可水平扩展,无单点,已在上千家企业部署落地,经受了严苛的生产实践检验。众多互联网头部公司,夜莺集群机器达百台,处理数亿级时间线,重度使用夜莺监控;
|
||||
- **灵活扩展 中心化管理**
|
||||
- 夜莺监控,可部署在 1 核 1G 的云主机,可在上百台机器集群化部署,可运行在 K8s 中;也可将时序库、告警引擎等组件下沉到各机房、各 Region,兼顾边缘部署和中心化统一管理,**解决数据割裂,缺乏统一视图的难题**;
|
||||
- **开放社区**
|
||||
- 托管于[中国计算机学会开源发展委员会](https://www.ccf.org.cn/kyfzwyh/),有[快猫星云](https://flashcat.cloud)和众多公司的持续投入,和数千名社区用户的积极参与,以及夜莺监控项目清晰明确的定位,都保证了夜莺开源社区健康、长久的发展。活跃、专业的社区用户也在持续迭代和沉淀更多的最佳实践于产品中;
|
||||
|
||||
## 使用场景
|
||||
1. **如果您希望在一个平台中,统一管理和查看 Metrics、Logging、Tracing 数据,推荐你使用夜莺**:
|
||||
- 请参考阅读:[不止于监控,夜莺 V6 全新升级为开源观测平台](http://flashcat.cloud/blog/nightingale-v6-release/)
|
||||
2. **如果您在使用 Prometheus 过程中,有以下的一个或者多个需求场景,推荐您无缝升级到夜莺**:
|
||||
- Prometheus、Alertmanager、Grafana 等多个系统较为割裂,缺乏统一视图,无法开箱即用;
|
||||
- 通过修改配置文件来管理 Prometheus、Alertmanager 的方式,学习曲线大,协同有难度;
|
||||
- 数据量过大而无法扩展您的 Prometheus 集群;
|
||||
- 生产环境运行多套 Prometheus 集群,面临管理和使用成本高的问题;
|
||||
3. **如果您在使用 Zabbix,有以下的场景,推荐您升级到夜莺**:
|
||||
- 监控的数据量太大,希望有更好的扩展解决方案;
|
||||
- 学习曲线高,多人多团队模式下,希望有更好的协同使用效率;
|
||||
- 微服务和云原生架构下,监控数据的生命周期多变、监控数据维度基数高,Zabbix 数据模型不易适配;
|
||||
- 了解更多Zabbix和夜莺监控的对比,推荐您进一步阅读[Zabbix 和夜莺监控选型对比](https://flashcat.cloud/blog/zabbx-vs-nightingale/)
|
||||
4. **如果您在使用 [Open-Falcon](https://github.com/open-falcon/falcon-plus),我们推荐您升级到夜莺:**
|
||||
- 关于 Open-Falcon 和夜莺的详细介绍,请参考阅读:[云原生监控的十个特点和趋势](http://flashcat.cloud/blog/10-trends-of-cloudnative-monitoring/)
|
||||
- 监控系统和可观测平台的区别,请参考阅读:[从监控系统到可观测平台,Gap有多大
|
||||
](https://flashcat.cloud/blog/gap-of-monitoring-to-o11y/)
|
||||
5. **我们推荐您使用 [Categraf](https://github.com/flashcatcloud/categraf) 作为首选的监控数据采集器**:
|
||||
- [Categraf](https://github.com/flashcatcloud/categraf) 是夜莺监控的默认采集器,采用开放插件机制和 All-in-one 的设计理念,同时支持 metric、log、trace、event 的采集。Categraf 不仅可以采集 CPU、内存、网络等系统层面的指标,也集成了众多开源组件的采集能力,支持K8s生态。Categraf 内置了对应的仪表盘和告警规则,开箱即用。
|
||||
|
||||
## 文档
|
||||
|
||||
[English Doc](https://n9e.github.io/) | [中文文档](https://flashcat.cloud/docs/)
|
||||
- **统一接入各种时序库**:支持对接 Prometheus、VictoriaMetrics、Thanos、Mimir、M3DB 等多种时序库,实现统一告警管理
|
||||
- **专业告警能力**:内置支持多种告警规则,可以扩展支持所有通知媒介,支持告警屏蔽、告警抑制、告警自愈、告警事件管理
|
||||
- **无缝搭配 [FlashDuty](https://flashcat.cloud/product/flashcat-duty/)**:实现告警聚合收敛、认领、升级、排班、IM集成,确保告警处理不遗漏,减少打扰,更好协同
|
||||
- **支持所有常见采集器**:支持 categraf、telegraf、grafana-agent、datadog-agent、给类 exporter 作为采集器,没有什么数据是不能监控的
|
||||
- **统一的观测平台**:从 v6 版本开始,支持接入 ElasticSearch、Jaeger 数据源,逐步实现日志、链路、指标的一体化观测
|
||||
|
||||
## 产品示意图
|
||||
|
||||
https://user-images.githubusercontent.com/792850/216888712-2565fcea-9df5-47bd-a49e-d60af9bd76e8.mp4
|
||||
|
||||
## 夜莺架构
|
||||
|
||||
夜莺监控可以接收各种采集器上报的监控数据(比如 [Categraf](https://github.com/flashcatcloud/categraf)、telegraf、grafana-agent、Prometheus),并写入多种流行的时序数据库中(可以支持Prometheus、M3DB、VictoriaMetrics、Thanos、TDEngine等),提供告警规则、屏蔽规则、订阅规则的配置能力,提供监控数据的查看能力,提供告警自愈机制(告警触发之后自动回调某个webhook地址或者执行某个脚本),提供历史告警事件的存储管理、分组查看的能力。
|
||||
## 加入交流群
|
||||
|
||||
### 中心汇聚式部署方案
|
||||
欢迎加入 QQ 交流群,群号:479290895,也可以扫下方二维码加入微信交流群:
|
||||
|
||||

|
||||
|
||||
夜莺只有一个模块,就是 n9e,可以部署多个 n9e 实例组成集群,n9e 依赖 2 个存储,数据库、Redis,数据库可以使用 MySQL 或 Postgres,自己按需选用。
|
||||
|
||||
n9e 提供的是 HTTP 接口,前面负载均衡可以是 4 层的,也可以是 7 层的。一般就选用 Nginx 就可以了。
|
||||
|
||||
n9e 这个模块接收到数据之后,需要转发给后端的时序库,相关配置是:
|
||||
|
||||
```toml
|
||||
[Pushgw]
|
||||
LabelRewrite = true
|
||||
[[Pushgw.Writers]]
|
||||
Url = "http://127.0.0.1:9090/api/v1/write"
|
||||
```
|
||||
|
||||
> 注意:虽然数据源可以在页面配置了,但是上报转发链路,还是需要在配置文件指定。
|
||||
|
||||
所有机房的 agent( 比如 Categraf、Telegraf、 Grafana-agent、Datadog-agent ),都直接推数据给 n9e,这个架构最为简单,维护成本最低。当然,前提是要求机房之间网络链路比较好,一般有专线。如果网络链路不好,则要使用下面的部署方式了。
|
||||
|
||||
### 边缘下沉式混杂部署方案
|
||||
|
||||

|
||||
|
||||
这个图尝试解释 3 种不同的情形,比如 A 机房和中心网络链路很好,Categraf 可以直接汇报数据给中心 n9e 模块,另一个机房网络链路不好,就需要把时序库下沉部署,时序库下沉了,对应的告警引擎和转发网关也都要跟随下沉,这样数据不会跨机房传输,比较稳定。但是心跳还是需要往中心心跳,要不然在对象列表里看不到机器的 CPU、内存使用率。还有的时候,可能是接入的一个已有的 Prometheus,数据采集没有走 Categraf,那此时只需要把 Prometheus 作为数据源接入夜莺即可,可以在夜莺里看图、配告警规则,但是就是在对象列表里看不到,也不能使用告警自愈的功能,问题也不大,核心功能都不受影响。
|
||||
|
||||
边缘机房,下沉部署时序库、告警引擎、转发网关的时候,要注意,告警引擎需要依赖数据库,因为要同步告警规则,转发网关也要依赖数据库,因为要注册对象到数据库里去,需要打通相关网络,告警引擎和转发网关都不用Redis,所以无需为 Redis 打通网络。
|
||||
|
||||
### VictoriaMetrics 集群架构
|
||||
<img src="doc/img/install-vm.png" width="600">
|
||||
|
||||
如果单机版本的时序数据库(比如 Prometheus) 性能有瓶颈或容灾较差,我们推荐使用 [VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics),VictoriaMetrics 架构较为简单,性能优异,易于部署和运维,架构图如上。VictoriaMetrics 更详尽的文档,还请参考其[官网](https://victoriametrics.com/)。
|
||||
|
||||
## 夜莺社区
|
||||
|
||||
开源项目要更有生命力,离不开开放的治理架构和源源不断的开发者和用户共同参与,我们致力于建立开放、中立的开源治理架构,吸纳更多来自企业、高校等各方面对云原生监控感兴趣、有热情的开发者,一起打造有活力的夜莺开源社区。关于《夜莺开源项目和社区治理架构(草案)》,请查阅 [COMMUNITY GOVERNANCE](./doc/community-governance.md).
|
||||
|
||||
**我们欢迎您以各种方式参与到夜莺开源项目和开源社区中来,工作包括不限于**:
|
||||
- 补充和完善文档 => [n9e.github.io](https://n9e.github.io/)
|
||||
- 分享您在使用夜莺监控过程中的最佳实践和经验心得 => [文章分享](https://flashcat.cloud/docs/content/flashcat-monitor/nightingale/share/)
|
||||
- 提交产品建议 =》 [github issue](https://github.com/ccfos/nightingale/issues/new?assignees=&labels=kind%2Ffeature&template=enhancement.md)
|
||||
- 提交代码,让夜莺监控更快、更稳、更好用 => [github pull request](https://github.com/didi/nightingale/pulls)
|
||||
|
||||
**尊重、认可和记录每一位贡献者的工作**是夜莺开源社区的第一指导原则,我们提倡**高效的提问**,这既是对开发者时间的尊重,也是对整个社区知识沉淀的贡献:
|
||||
- 提问之前请先查阅 [FAQ](https://www.gitlink.org.cn/ccfos/nightingale/wiki/faq)
|
||||
- 我们使用[论坛](https://answer.flashcat.cloud/)进行交流,有问题可以到这里搜索、提问
|
||||
- 我们也推荐你加入微信群,和其他夜莺用户交流经验 (请先加好友:[picobyte](https://www.gitlink.org.cn/UlricQin/gist/tree/master/self.jpeg) 备注:夜莺加群+姓名+公司)
|
||||
|
||||
|
||||
## Who is using Nightingale
|
||||
|
||||
您可以通过在 **[Who is Using Nightingale](https://github.com/ccfos/nightingale/issues/897)** 登记您的使用情况,分享您的使用经验。
|
||||
<img src="doc/img/wecom.png" width="240">
|
||||
|
||||
## Stargazers over time
|
||||
[](https://starchart.cc/ccfos/nightingale)
|
||||
@@ -143,6 +62,7 @@ Url = "http://127.0.0.1:9090/api/v1/write"
|
||||
## License
|
||||
[Apache License V2.0](https://github.com/didi/nightingale/blob/main/LICENSE)
|
||||
|
||||
## 加入交流群
|
||||
## 社区管理
|
||||
|
||||
[夜莺开源项目和社区治理架构(草案)](./doc/community-governance.md)
|
||||
|
||||
<img src="doc/img/wecom.png" width="120">
|
||||
|
||||
@@ -23,7 +23,6 @@ import (
|
||||
"github.com/ccfos/nightingale/v6/prom"
|
||||
"github.com/ccfos/nightingale/v6/pushgw/pconf"
|
||||
"github.com/ccfos/nightingale/v6/pushgw/writer"
|
||||
"github.com/ccfos/nightingale/v6/storage"
|
||||
)
|
||||
|
||||
func Initialize(configDir string, cryptoKey string) (func(), error) {
|
||||
@@ -37,21 +36,12 @@ func Initialize(configDir string, cryptoKey string) (func(), error) {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
db, err := storage.New(config.DB)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
ctx := ctx.NewContext(context.Background(), db)
|
||||
|
||||
redis, err := storage.NewRedis(config.Redis)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
ctx := ctx.NewContext(context.Background(), nil, false, config.CenterApi)
|
||||
|
||||
syncStats := memsto.NewSyncStats()
|
||||
alertStats := astats.NewSyncStats()
|
||||
|
||||
targetCache := memsto.NewTargetCache(ctx, syncStats, redis)
|
||||
targetCache := memsto.NewTargetCache(ctx, syncStats, nil)
|
||||
busiGroupCache := memsto.NewBusiGroupCache(ctx, syncStats)
|
||||
alertMuteCache := memsto.NewAlertMuteCache(ctx, syncStats)
|
||||
alertRuleCache := memsto.NewAlertRuleCache(ctx, syncStats)
|
||||
@@ -62,7 +52,7 @@ func Initialize(configDir string, cryptoKey string) (func(), error) {
|
||||
|
||||
externalProcessors := process.NewExternalProcessors()
|
||||
|
||||
Start(config.Alert, config.Pushgw, syncStats, alertStats, externalProcessors, targetCache, busiGroupCache, alertMuteCache, alertRuleCache, notifyConfigCache, dsCache, ctx, promClients, false)
|
||||
Start(config.Alert, config.Pushgw, syncStats, alertStats, externalProcessors, targetCache, busiGroupCache, alertMuteCache, alertRuleCache, notifyConfigCache, dsCache, ctx, promClients)
|
||||
|
||||
r := httpx.GinEngine(config.Global.RunMode, config.HTTP)
|
||||
rt := router.New(config.HTTP, config.Alert, alertMuteCache, targetCache, busiGroupCache, alertStats, ctx, externalProcessors)
|
||||
@@ -77,7 +67,7 @@ func Initialize(configDir string, cryptoKey string) (func(), error) {
|
||||
}
|
||||
|
||||
func Start(alertc aconf.Alert, pushgwc pconf.Pushgw, syncStats *memsto.Stats, alertStats *astats.Stats, externalProcessors *process.ExternalProcessorsType, targetCache *memsto.TargetCacheType, busiGroupCache *memsto.BusiGroupCacheType,
|
||||
alertMuteCache *memsto.AlertMuteCacheType, alertRuleCache *memsto.AlertRuleCacheType, notifyConfigCache *memsto.NotifyConfigCacheType, datasourceCache *memsto.DatasourceCacheType, ctx *ctx.Context, promClients *prom.PromClientMap, isCenter bool) {
|
||||
alertMuteCache *memsto.AlertMuteCacheType, alertRuleCache *memsto.AlertRuleCacheType, notifyConfigCache *memsto.NotifyConfigCacheType, datasourceCache *memsto.DatasourceCacheType, ctx *ctx.Context, promClients *prom.PromClientMap) {
|
||||
userCache := memsto.NewUserCache(ctx, syncStats)
|
||||
userGroupCache := memsto.NewUserGroupCache(ctx, syncStats)
|
||||
alertSubscribeCache := memsto.NewAlertSubscribeCache(ctx, syncStats)
|
||||
@@ -85,12 +75,12 @@ func Start(alertc aconf.Alert, pushgwc pconf.Pushgw, syncStats *memsto.Stats, al
|
||||
|
||||
go models.InitNotifyConfig(ctx, alertc.Alerting.TemplatesDir)
|
||||
|
||||
naming := naming.NewNaming(ctx, alertc.Heartbeat, isCenter)
|
||||
naming := naming.NewNaming(ctx, alertc.Heartbeat)
|
||||
|
||||
writers := writer.NewWriters(pushgwc)
|
||||
record.NewScheduler(alertc, recordingRuleCache, promClients, writers, alertStats)
|
||||
|
||||
eval.NewScheduler(isCenter, alertc, externalProcessors, alertRuleCache, targetCache, busiGroupCache, alertMuteCache, datasourceCache, promClients, naming, ctx, alertStats)
|
||||
eval.NewScheduler(alertc, externalProcessors, alertRuleCache, targetCache, busiGroupCache, alertMuteCache, datasourceCache, promClients, naming, ctx, alertStats)
|
||||
|
||||
dp := dispatch.NewDispatch(alertRuleCache, userCache, userGroupCache, alertSubscribeCache, targetCache, notifyConfigCache, alertc.Alerting, ctx)
|
||||
consumer := dispatch.NewConsumer(alertc.Alerting, ctx, dp)
|
||||
|
||||
@@ -8,6 +8,7 @@ import (
|
||||
"github.com/ccfos/nightingale/v6/alert/queue"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
"github.com/ccfos/nightingale/v6/pkg/poster"
|
||||
|
||||
"github.com/toolkits/pkg/concurrent/semaphore"
|
||||
"github.com/toolkits/pkg/logger"
|
||||
@@ -82,78 +83,17 @@ func (e *Consumer) consumeOne(event *models.AlertCurEvent) {
|
||||
}
|
||||
|
||||
func (e *Consumer) persist(event *models.AlertCurEvent) {
|
||||
has, err := models.AlertCurEventExists(e.ctx, "hash=?", event.Hash)
|
||||
if err != nil {
|
||||
logger.Errorf("event_persist_check_exists_fail: %v rule_id=%d hash=%s", err, event.RuleId, event.Hash)
|
||||
return
|
||||
}
|
||||
|
||||
his := event.ToHis(e.ctx)
|
||||
|
||||
// 不管是告警还是恢复,全量告警里都要记录
|
||||
if err := his.Add(e.ctx); err != nil {
|
||||
logger.Errorf(
|
||||
"event_persist_his_fail: %v rule_id=%d cluster:%s hash=%s tags=%v timestamp=%d value=%s",
|
||||
err,
|
||||
event.RuleId,
|
||||
event.Cluster,
|
||||
event.Hash,
|
||||
event.TagsJSON,
|
||||
event.TriggerTime,
|
||||
event.TriggerValue,
|
||||
)
|
||||
}
|
||||
|
||||
if has {
|
||||
// 活跃告警表中有记录,删之
|
||||
err = models.AlertCurEventDelByHash(e.ctx, event.Hash)
|
||||
if !e.ctx.IsCenter {
|
||||
event.DB2FE()
|
||||
err := poster.PostByUrls(e.ctx, "/v1/n9e/event-persist", event)
|
||||
if err != nil {
|
||||
logger.Errorf("event_del_cur_fail: %v hash=%s", err, event.Hash)
|
||||
return
|
||||
logger.Errorf("event%+v persist err:%v", event, err)
|
||||
}
|
||||
|
||||
if !event.IsRecovered {
|
||||
// 恢复事件,从活跃告警列表彻底删掉,告警事件,要重新加进来新的event
|
||||
// use his id as cur id
|
||||
event.Id = his.Id
|
||||
if event.Id > 0 {
|
||||
if err := event.Add(e.ctx); err != nil {
|
||||
logger.Errorf(
|
||||
"event_persist_cur_fail: %v rule_id=%d cluster:%s hash=%s tags=%v timestamp=%d value=%s",
|
||||
err,
|
||||
event.RuleId,
|
||||
event.Cluster,
|
||||
event.Hash,
|
||||
event.TagsJSON,
|
||||
event.TriggerTime,
|
||||
event.TriggerValue,
|
||||
)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return
|
||||
}
|
||||
|
||||
if event.IsRecovered {
|
||||
// alert_cur_event表里没有数据,表示之前没告警,结果现在报了恢复,神奇....理论上不应该出现的
|
||||
return
|
||||
}
|
||||
|
||||
// use his id as cur id
|
||||
event.Id = his.Id
|
||||
if event.Id > 0 {
|
||||
if err := event.Add(e.ctx); err != nil {
|
||||
logger.Errorf(
|
||||
"event_persist_cur_fail: %v rule_id=%d cluster:%s hash=%s tags=%v timestamp=%d value=%s",
|
||||
err,
|
||||
event.RuleId,
|
||||
event.Cluster,
|
||||
event.Hash,
|
||||
event.TagsJSON,
|
||||
event.TriggerTime,
|
||||
event.TriggerValue,
|
||||
)
|
||||
}
|
||||
err := models.EventPersist(e.ctx, event)
|
||||
if err != nil {
|
||||
logger.Errorf("event%+v persist err:%v", event, err)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -28,8 +28,9 @@ type Dispatch struct {
|
||||
|
||||
alerting aconf.Alerting
|
||||
|
||||
senders map[string]sender.Sender
|
||||
tpls map[string]*template.Template
|
||||
senders map[string]sender.Sender
|
||||
tpls map[string]*template.Template
|
||||
ExtraSenders map[string]sender.Sender
|
||||
|
||||
ctx *ctx.Context
|
||||
|
||||
@@ -50,8 +51,9 @@ func NewDispatch(alertRuleCache *memsto.AlertRuleCacheType, userCache *memsto.Us
|
||||
|
||||
alerting: alerting,
|
||||
|
||||
senders: make(map[string]sender.Sender),
|
||||
tpls: make(map[string]*template.Template),
|
||||
senders: make(map[string]sender.Sender),
|
||||
tpls: make(map[string]*template.Template),
|
||||
ExtraSenders: make(map[string]sender.Sender),
|
||||
|
||||
ctx: ctx,
|
||||
}
|
||||
@@ -89,6 +91,12 @@ func (e *Dispatch) relaodTpls() error {
|
||||
models.Telegram: sender.NewSender(models.Telegram, tmpTpls, smtp),
|
||||
}
|
||||
|
||||
e.RwLock.RLock()
|
||||
for channel, sender := range e.ExtraSenders {
|
||||
senders[channel] = sender
|
||||
}
|
||||
e.RwLock.RUnlock()
|
||||
|
||||
e.RwLock.Lock()
|
||||
e.tpls = tmpTpls
|
||||
e.senders = senders
|
||||
@@ -180,7 +188,7 @@ func (e *Dispatch) Send(rule *models.AlertRule, event *models.AlertCurEvent, not
|
||||
s := e.senders[channel]
|
||||
e.RwLock.RUnlock()
|
||||
if s == nil {
|
||||
logger.Warningf("no sender for channel: %s", channel)
|
||||
logger.Debugf("no sender for channel: %s", channel)
|
||||
continue
|
||||
}
|
||||
logger.Debugf("send event: %s, channel: %s", event.Hash, channel)
|
||||
@@ -191,7 +199,7 @@ func (e *Dispatch) Send(rule *models.AlertRule, event *models.AlertCurEvent, not
|
||||
}
|
||||
|
||||
// handle event callbacks
|
||||
sender.SendCallbacks(e.ctx, notifyTarget.ToCallbackList(), event, e.targetCache, e.notifyConfigCache.GetIbex())
|
||||
sender.SendCallbacks(e.ctx, notifyTarget.ToCallbackList(), event, e.targetCache, e.userCache, e.notifyConfigCache.GetIbex())
|
||||
|
||||
// handle global webhooks
|
||||
sender.SendWebhooks(notifyTarget.ToWebhookList(), event)
|
||||
|
||||
@@ -16,7 +16,6 @@ import (
|
||||
)
|
||||
|
||||
type Scheduler struct {
|
||||
isCenter bool
|
||||
// key: hash
|
||||
alertRules map[string]*AlertRuleWorker
|
||||
|
||||
@@ -38,11 +37,10 @@ type Scheduler struct {
|
||||
stats *astats.Stats
|
||||
}
|
||||
|
||||
func NewScheduler(isCenter bool, aconf aconf.Alert, externalProcessors *process.ExternalProcessorsType, arc *memsto.AlertRuleCacheType, targetCache *memsto.TargetCacheType,
|
||||
func NewScheduler(aconf aconf.Alert, externalProcessors *process.ExternalProcessorsType, arc *memsto.AlertRuleCacheType, targetCache *memsto.TargetCacheType,
|
||||
busiGroupCache *memsto.BusiGroupCacheType, alertMuteCache *memsto.AlertMuteCacheType, datasourceCache *memsto.DatasourceCacheType, promClients *prom.PromClientMap, naming *naming.Naming,
|
||||
ctx *ctx.Context, stats *astats.Stats) *Scheduler {
|
||||
scheduler := &Scheduler{
|
||||
isCenter: isCenter,
|
||||
aconf: aconf,
|
||||
alertRules: make(map[string]*AlertRuleWorker),
|
||||
|
||||
@@ -108,7 +106,7 @@ func (s *Scheduler) syncAlertRules() {
|
||||
alertRule := NewAlertRuleWorker(rule, dsId, processor, s.promClients, s.ctx)
|
||||
alertRuleWorkers[alertRule.Hash()] = alertRule
|
||||
}
|
||||
} else if rule.IsHostRule() && s.isCenter {
|
||||
} else if rule.IsHostRule() && s.ctx.IsCenter {
|
||||
// all host rule will be processed by center instance
|
||||
if !naming.DatasourceHashRing.IsHit(naming.HostDatasource, fmt.Sprintf("%d", rule.Id), s.aconf.Heartbeat.Endpoint) {
|
||||
continue
|
||||
|
||||
@@ -109,7 +109,7 @@ func (arw *AlertRuleWorker) Eval() {
|
||||
}
|
||||
|
||||
func (arw *AlertRuleWorker) Stop() {
|
||||
logger.Infof("%s stopped", arw.Key())
|
||||
logger.Infof("rule_eval %s stopped", arw.Key())
|
||||
close(arw.quit)
|
||||
}
|
||||
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
package naming
|
||||
|
||||
import (
|
||||
"errors"
|
||||
"sync"
|
||||
|
||||
"github.com/toolkits/pkg/consistent"
|
||||
@@ -39,8 +40,8 @@ func RebuildConsistentHashRing(datasourceId int64, nodes []string) {
|
||||
}
|
||||
|
||||
func (chr *DatasourceHashRingType) GetNode(datasourceId int64, pk string) (string, error) {
|
||||
chr.RLock()
|
||||
defer chr.RUnlock()
|
||||
chr.Lock()
|
||||
defer chr.Unlock()
|
||||
_, exists := chr.Rings[datasourceId]
|
||||
if !exists {
|
||||
chr.Rings[datasourceId] = NewConsistentHashRing(int32(NodeReplicas), []string{})
|
||||
@@ -52,14 +53,18 @@ func (chr *DatasourceHashRingType) GetNode(datasourceId int64, pk string) (strin
|
||||
func (chr *DatasourceHashRingType) IsHit(datasourceId int64, pk string, currentNode string) bool {
|
||||
node, err := chr.GetNode(datasourceId, pk)
|
||||
if err != nil {
|
||||
logger.Debugf("datasource id:%d pk:%s failed to get node from hashring:%v", datasourceId, pk, err)
|
||||
if errors.Is(err, consistent.ErrEmptyCircle) {
|
||||
logger.Debugf("rule id:%s is not work, datasource id:%d is not assigned to active alert engine", pk, datasourceId)
|
||||
} else {
|
||||
logger.Debugf("rule id:%s is not work, datasource id:%d failed to get node from hashring:%v", pk, datasourceId, err)
|
||||
}
|
||||
return false
|
||||
}
|
||||
return node == currentNode
|
||||
}
|
||||
|
||||
func (chr *DatasourceHashRingType) Set(datasourceId int64, r *consistent.Consistent) {
|
||||
chr.RLock()
|
||||
defer chr.RUnlock()
|
||||
chr.Lock()
|
||||
defer chr.Unlock()
|
||||
chr.Rings[datasourceId] = r
|
||||
}
|
||||
|
||||
@@ -9,6 +9,7 @@ import (
|
||||
"github.com/ccfos/nightingale/v6/alert/aconf"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
"github.com/ccfos/nightingale/v6/pkg/poster"
|
||||
|
||||
"github.com/toolkits/pkg/logger"
|
||||
)
|
||||
@@ -16,14 +17,12 @@ import (
|
||||
type Naming struct {
|
||||
ctx *ctx.Context
|
||||
heartbeatConfig aconf.HeartbeatConfig
|
||||
isCenter bool
|
||||
}
|
||||
|
||||
func NewNaming(ctx *ctx.Context, heartbeat aconf.HeartbeatConfig, isCenter bool) *Naming {
|
||||
func NewNaming(ctx *ctx.Context, heartbeat aconf.HeartbeatConfig) *Naming {
|
||||
naming := &Naming{
|
||||
ctx: ctx,
|
||||
heartbeatConfig: heartbeat,
|
||||
isCenter: isCenter,
|
||||
}
|
||||
naming.Heartbeats()
|
||||
return naming
|
||||
@@ -45,6 +44,10 @@ func (n *Naming) Heartbeats() error {
|
||||
}
|
||||
|
||||
func (n *Naming) loopDeleteInactiveInstances() {
|
||||
if !n.ctx.IsCenter {
|
||||
return
|
||||
}
|
||||
|
||||
interval := time.Duration(10) * time.Minute
|
||||
for {
|
||||
time.Sleep(interval)
|
||||
@@ -74,7 +77,7 @@ func (n *Naming) heartbeat() error {
|
||||
var err error
|
||||
|
||||
// 在页面上维护实例和集群的对应关系
|
||||
datasourceIds, err = models.GetDatasourceIdsByClusterName(n.ctx, n.heartbeatConfig.EngineName)
|
||||
datasourceIds, err = models.GetDatasourceIdsByEngineName(n.ctx, n.heartbeatConfig.EngineName)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
@@ -112,7 +115,7 @@ func (n *Naming) heartbeat() error {
|
||||
localss[datasourceIds[i]] = newss
|
||||
}
|
||||
|
||||
if n.isCenter {
|
||||
if n.ctx.IsCenter {
|
||||
// 如果是中心节点,还需要处理 host 类型的告警规则,host 类型告警规则,和数据源无关,想复用下数据源的 hash ring,想用一个虚假的数据源 id 来处理
|
||||
// if is center node, we need to handle host type alerting rules, host type alerting rules are not related to datasource, we want to reuse the hash ring of datasource, we want to use a fake datasource id to handle it
|
||||
err := models.AlertingEngineHeartbeatWithCluster(n.ctx, n.heartbeatConfig.Endpoint, n.heartbeatConfig.EngineName, HostDatasource)
|
||||
@@ -146,6 +149,11 @@ func (n *Naming) ActiveServers(datasourceId int64) ([]string, error) {
|
||||
return nil, fmt.Errorf("cluster is empty")
|
||||
}
|
||||
|
||||
if !n.ctx.IsCenter {
|
||||
lst, err := poster.GetByUrls[[]string](n.ctx, "/v1/n9e/servers-active?dsid="+fmt.Sprintf("%d", datasourceId))
|
||||
return lst, err
|
||||
}
|
||||
|
||||
// 30秒内有心跳,就认为是活的
|
||||
return models.AlertingEngineGetsInstances(n.ctx, "datasource_id = ? and clock > ?", datasourceId, time.Now().Unix()-30)
|
||||
}
|
||||
|
||||
@@ -113,13 +113,12 @@ func (p *Processor) Handle(anomalyPoints []common.AnomalyPoint, from string, inh
|
||||
// 这些信息的修改是不会引起worker restart的,但是确实会影响告警处理逻辑
|
||||
// 所以,这里直接从memsto.AlertRuleCache中获取并覆盖
|
||||
p.inhibit = inhibit
|
||||
p.rule = p.atertRuleCache.Get(p.rule.Id)
|
||||
cachedRule := p.rule
|
||||
cachedRule := p.atertRuleCache.Get(p.rule.Id)
|
||||
if cachedRule == nil {
|
||||
logger.Errorf("rule not found %+v", anomalyPoints)
|
||||
return
|
||||
}
|
||||
|
||||
p.rule = cachedRule
|
||||
now := time.Now().Unix()
|
||||
alertingKeys := map[string]struct{}{}
|
||||
|
||||
@@ -338,7 +337,7 @@ func (p *Processor) pushEventToQueue(e *models.AlertCurEvent) {
|
||||
func (p *Processor) RecoverAlertCurEventFromDb() {
|
||||
p.pendings = NewAlertCurEventMap(nil)
|
||||
|
||||
curEvents, err := models.AlertCurEventGetByRuleIdAndCluster(p.ctx, p.rule.Id, p.datasourceId)
|
||||
curEvents, err := models.AlertCurEventGetByRuleIdAndDsId(p.ctx, p.rule.Id, p.datasourceId)
|
||||
if err != nil {
|
||||
logger.Errorf("recover event from db for rule:%s failed, err:%s", p.Key(), err)
|
||||
p.fires = NewAlertCurEventMap(nil)
|
||||
|
||||
@@ -39,15 +39,16 @@ func New(httpConfig httpx.Config, alert aconf.Alert, amc *memsto.AlertMuteCacheT
|
||||
}
|
||||
|
||||
func (rt *Router) Config(r *gin.Engine) {
|
||||
if !rt.HTTP.Alert.Enable {
|
||||
if !rt.HTTP.APIForService.Enable {
|
||||
return
|
||||
}
|
||||
|
||||
service := r.Group("/v1/n9e")
|
||||
if len(rt.HTTP.Alert.BasicAuth) > 0 {
|
||||
service.Use(gin.BasicAuth(rt.HTTP.Alert.BasicAuth))
|
||||
if len(rt.HTTP.APIForService.BasicAuth) > 0 {
|
||||
service.Use(gin.BasicAuth(rt.HTTP.APIForService.BasicAuth))
|
||||
}
|
||||
service.POST("/event", rt.pushEventToQueue)
|
||||
service.POST("/event-persist", rt.eventPersist)
|
||||
service.POST("/make-event", rt.makeEvent)
|
||||
}
|
||||
|
||||
|
||||
@@ -83,6 +83,13 @@ func (rt *Router) pushEventToQueue(c *gin.Context) {
|
||||
ginx.NewRender(c).Message(nil)
|
||||
}
|
||||
|
||||
func (rt *Router) eventPersist(c *gin.Context) {
|
||||
var event *models.AlertCurEvent
|
||||
ginx.BindJSON(c, &event)
|
||||
event.FE2DB()
|
||||
ginx.NewRender(c).Message(models.EventPersist(rt.Ctx, event))
|
||||
}
|
||||
|
||||
type eventForm struct {
|
||||
Alert bool `json:"alert"`
|
||||
AnomalyPoints []common.AnomalyPoint `json:"vectors"`
|
||||
|
||||
@@ -15,7 +15,7 @@ import (
|
||||
"github.com/toolkits/pkg/logger"
|
||||
)
|
||||
|
||||
func SendCallbacks(ctx *ctx.Context, urls []string, event *models.AlertCurEvent, targetCache *memsto.TargetCacheType, ibexConf aconf.Ibex) {
|
||||
func SendCallbacks(ctx *ctx.Context, urls []string, event *models.AlertCurEvent, targetCache *memsto.TargetCacheType, userCache *memsto.UserCacheType, ibexConf aconf.Ibex) {
|
||||
for _, url := range urls {
|
||||
if url == "" {
|
||||
continue
|
||||
@@ -23,7 +23,7 @@ func SendCallbacks(ctx *ctx.Context, urls []string, event *models.AlertCurEvent,
|
||||
|
||||
if strings.HasPrefix(url, "${ibex}") {
|
||||
if !event.IsRecovered {
|
||||
handleIbex(ctx, url, event, targetCache, ibexConf)
|
||||
handleIbex(ctx, url, event, targetCache, userCache, ibexConf)
|
||||
}
|
||||
continue
|
||||
}
|
||||
@@ -34,9 +34,9 @@ func SendCallbacks(ctx *ctx.Context, urls []string, event *models.AlertCurEvent,
|
||||
|
||||
resp, code, err := poster.PostJSON(url, 5*time.Second, event, 3)
|
||||
if err != nil {
|
||||
logger.Errorf("event_callback(rule_id=%d url=%s) fail, resp: %s, err: %v, code: %d", event.RuleId, url, string(resp), err, code)
|
||||
logger.Errorf("event_callback_fail(rule_id=%d url=%s), resp: %s, err: %v, code: %d", event.RuleId, url, string(resp), err, code)
|
||||
} else {
|
||||
logger.Infof("event_callback(rule_id=%d url=%s) succ, resp: %s, code: %d", event.RuleId, url, string(resp), code)
|
||||
logger.Infof("event_callback_succ(rule_id=%d url=%s), resp: %s, code: %d", event.RuleId, url, string(resp), code)
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -60,7 +60,7 @@ type TaskCreateReply struct {
|
||||
Dat int64 `json:"dat"` // task.id
|
||||
}
|
||||
|
||||
func handleIbex(ctx *ctx.Context, url string, event *models.AlertCurEvent, targetCache *memsto.TargetCacheType, ibexConf aconf.Ibex) {
|
||||
func handleIbex(ctx *ctx.Context, url string, event *models.AlertCurEvent, targetCache *memsto.TargetCacheType, userCache *memsto.UserCacheType, ibexConf aconf.Ibex) {
|
||||
arr := strings.Split(url, "/")
|
||||
|
||||
var idstr string
|
||||
@@ -103,7 +103,7 @@ func handleIbex(ctx *ctx.Context, url string, event *models.AlertCurEvent, targe
|
||||
|
||||
// check perm
|
||||
// tpl.GroupId - host - account 三元组校验权限
|
||||
can, err := canDoIbex(ctx, tpl.UpdateBy, tpl, host, targetCache)
|
||||
can, err := canDoIbex(ctx, tpl.UpdateBy, tpl, host, targetCache, userCache)
|
||||
if err != nil {
|
||||
logger.Errorf("event_callback_ibex: check perm fail: %v", err)
|
||||
return
|
||||
@@ -154,6 +154,7 @@ func handleIbex(ctx *ctx.Context, url string, event *models.AlertCurEvent, targe
|
||||
// write db
|
||||
record := models.TaskRecord{
|
||||
Id: res.Dat,
|
||||
EventId: event.Id,
|
||||
GroupId: tpl.GroupId,
|
||||
IbexAddress: ibexConf.Address,
|
||||
IbexAuthUser: ibexConf.BasicAuthUser,
|
||||
@@ -175,12 +176,8 @@ func handleIbex(ctx *ctx.Context, url string, event *models.AlertCurEvent, targe
|
||||
}
|
||||
}
|
||||
|
||||
func canDoIbex(ctx *ctx.Context, username string, tpl *models.TaskTpl, host string, targetCache *memsto.TargetCacheType) (bool, error) {
|
||||
user, err := models.UserGetByUsername(ctx, username)
|
||||
if err != nil {
|
||||
return false, err
|
||||
}
|
||||
|
||||
func canDoIbex(ctx *ctx.Context, username string, tpl *models.TaskTpl, host string, targetCache *memsto.TargetCacheType, userCache *memsto.UserCacheType) (bool, error) {
|
||||
user := userCache.GetByUsername(username)
|
||||
if user != nil && user.IsAdmin() {
|
||||
return true, nil
|
||||
}
|
||||
|
||||
@@ -49,7 +49,7 @@ func (ds *DingtalkSender) Send(ctx MessageContext) {
|
||||
body = dingtalk{
|
||||
Msgtype: "markdown",
|
||||
Markdown: dingtalkMarkdown{
|
||||
Title: ctx.Rule.Name,
|
||||
Title: ctx.Event.RuleName,
|
||||
Text: message,
|
||||
},
|
||||
}
|
||||
@@ -57,7 +57,7 @@ func (ds *DingtalkSender) Send(ctx MessageContext) {
|
||||
body = dingtalk{
|
||||
Msgtype: "markdown",
|
||||
Markdown: dingtalkMarkdown{
|
||||
Title: ctx.Rule.Name,
|
||||
Title: ctx.Event.RuleName,
|
||||
Text: message + "\n" + strings.Join(ats, " "),
|
||||
},
|
||||
At: dingtalkAt{
|
||||
|
||||
@@ -31,7 +31,7 @@ func (es *EmailSender) Send(ctx MessageContext) {
|
||||
if es.subjectTpl != nil {
|
||||
subject = BuildTplMessage(es.subjectTpl, ctx.Event)
|
||||
} else {
|
||||
subject = ctx.Rule.Name
|
||||
subject = ctx.Event.RuleName
|
||||
}
|
||||
content := BuildTplMessage(es.contentTpl, ctx.Event)
|
||||
es.WriteEmail(subject, content, tos)
|
||||
|
||||
@@ -35,7 +35,7 @@ func alertingCallScript(stdinBytes []byte, notifyScript models.NotifyScript) {
|
||||
if file.IsExist(fpath) {
|
||||
oldContent, err := file.ToString(fpath)
|
||||
if err != nil {
|
||||
logger.Errorf("event_notify: read script file err: %v", err)
|
||||
logger.Errorf("event_script_notify_fail: read script file err: %v", err)
|
||||
return
|
||||
}
|
||||
|
||||
@@ -47,13 +47,13 @@ func alertingCallScript(stdinBytes []byte, notifyScript models.NotifyScript) {
|
||||
if rewrite {
|
||||
_, err := file.WriteString(fpath, config.Content)
|
||||
if err != nil {
|
||||
logger.Errorf("event_notify: write script file err: %v", err)
|
||||
logger.Errorf("event_script_notify_fail: write script file err: %v", err)
|
||||
return
|
||||
}
|
||||
|
||||
err = os.Chmod(fpath, 0777)
|
||||
if err != nil {
|
||||
logger.Errorf("event_notify: chmod script file err: %v", err)
|
||||
logger.Errorf("event_script_notify_fail: chmod script file err: %v", err)
|
||||
return
|
||||
}
|
||||
}
|
||||
@@ -70,7 +70,7 @@ func alertingCallScript(stdinBytes []byte, notifyScript models.NotifyScript) {
|
||||
|
||||
err := startCmd(cmd)
|
||||
if err != nil {
|
||||
logger.Errorf("event_notify: run cmd err: %v", err)
|
||||
logger.Errorf("event_script_notify_fail: run cmd err: %v", err)
|
||||
return
|
||||
}
|
||||
|
||||
@@ -78,20 +78,20 @@ func alertingCallScript(stdinBytes []byte, notifyScript models.NotifyScript) {
|
||||
|
||||
if isTimeout {
|
||||
if err == nil {
|
||||
logger.Errorf("event_notify: timeout and killed process %s", fpath)
|
||||
logger.Errorf("event_script_notify_fail: timeout and killed process %s", fpath)
|
||||
}
|
||||
|
||||
if err != nil {
|
||||
logger.Errorf("event_notify: kill process %s occur error %v", fpath, err)
|
||||
logger.Errorf("event_script_notify_fail: kill process %s occur error %v", fpath, err)
|
||||
}
|
||||
|
||||
return
|
||||
}
|
||||
|
||||
if err != nil {
|
||||
logger.Errorf("event_notify: exec script %s occur error: %v, output: %s", fpath, err, buf.String())
|
||||
logger.Errorf("event_script_notify_fail: exec script %s occur error: %v, output: %s", fpath, err, buf.String())
|
||||
return
|
||||
}
|
||||
|
||||
logger.Infof("event_notify: exec %s output: %s", fpath, buf.String())
|
||||
logger.Infof("event_script_notify_ok: exec %s output: %s", fpath, buf.String())
|
||||
}
|
||||
|
||||
@@ -54,6 +54,7 @@ func BuildTplMessage(tpl *template.Template, event *models.AlertCurEvent) string
|
||||
if tpl == nil {
|
||||
return "tpl for current sender not found, please check configuration"
|
||||
}
|
||||
|
||||
var body bytes.Buffer
|
||||
if err := tpl.Execute(&body, event); err != nil {
|
||||
return err.Error()
|
||||
|
||||
@@ -53,7 +53,7 @@ func SendWebhooks(webhooks []*models.Webhook, event *models.AlertCurEvent) {
|
||||
var resp *http.Response
|
||||
resp, err = client.Do(req)
|
||||
if err != nil {
|
||||
logger.Warningf("WebhookCallError, ruleId: [%d], eventId: [%d], url: [%s], error: [%s]", event.RuleId, event.Id, conf.Url, err)
|
||||
logger.Errorf("event_webhook_fail, ruleId: [%d], eventId: [%d], url: [%s], error: [%s]", event.RuleId, event.Id, conf.Url, err)
|
||||
continue
|
||||
}
|
||||
|
||||
@@ -63,6 +63,6 @@ func SendWebhooks(webhooks []*models.Webhook, event *models.AlertCurEvent) {
|
||||
body, _ = ioutil.ReadAll(resp.Body)
|
||||
}
|
||||
|
||||
logger.Debugf("alertingWebhook done, url: %s, response code: %d, body: %s", conf.Url, resp.StatusCode, string(body))
|
||||
logger.Debugf("event_webhook_succ, url: %s, response code: %d, body: %s", conf.Url, resp.StatusCode, string(body))
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1,18 +1,12 @@
|
||||
package cconf
|
||||
|
||||
import (
|
||||
"github.com/gin-gonic/gin"
|
||||
)
|
||||
|
||||
type Center struct {
|
||||
Plugins []Plugin
|
||||
BasicAuth gin.Accounts
|
||||
MetricsYamlFile string
|
||||
OpsYamlFile string
|
||||
BuiltinIntegrationsDir string
|
||||
I18NHeaderKey string
|
||||
MetricDesc MetricDescType
|
||||
TargetMetrics map[string]string
|
||||
AnonymousAccess AnonymousAccess
|
||||
}
|
||||
|
||||
|
||||
@@ -4,7 +4,6 @@ import (
|
||||
"path"
|
||||
|
||||
"github.com/toolkits/pkg/file"
|
||||
"github.com/toolkits/pkg/runner"
|
||||
)
|
||||
|
||||
// metricDesc , As load map happens before read map, there is no necessary to use concurrent map for metric desc store
|
||||
@@ -33,10 +32,10 @@ func GetMetricDesc(lang, metric string) string {
|
||||
return MetricDesc.CommonDesc[metric]
|
||||
}
|
||||
|
||||
func LoadMetricsYaml(metricsYamlFile string) error {
|
||||
func LoadMetricsYaml(configDir, metricsYamlFile string) error {
|
||||
fp := metricsYamlFile
|
||||
if fp == "" {
|
||||
fp = path.Join(runner.Cwd, "etc", "metrics.yaml")
|
||||
fp = path.Join(configDir, "metrics.yaml")
|
||||
}
|
||||
if !file.IsExist(fp) {
|
||||
return nil
|
||||
|
||||
@@ -4,7 +4,6 @@ import (
|
||||
"path"
|
||||
|
||||
"github.com/toolkits/pkg/file"
|
||||
"github.com/toolkits/pkg/runner"
|
||||
)
|
||||
|
||||
var Operations = Operation{}
|
||||
@@ -19,10 +18,10 @@ type Ops struct {
|
||||
Ops []string `yaml:"ops" json:"ops"`
|
||||
}
|
||||
|
||||
func LoadOpsYaml(opsYamlFile string) error {
|
||||
func LoadOpsYaml(configDir string, opsYamlFile string) error {
|
||||
fp := opsYamlFile
|
||||
if fp == "" {
|
||||
fp = path.Join(runner.Cwd, "etc", "ops.yaml")
|
||||
fp = path.Join(configDir, "ops.yaml")
|
||||
}
|
||||
if !file.IsExist(fp) {
|
||||
return nil
|
||||
|
||||
@@ -33,8 +33,8 @@ func Initialize(configDir string, cryptoKey string) (func(), error) {
|
||||
return nil, fmt.Errorf("failed to init config: %v", err)
|
||||
}
|
||||
|
||||
cconf.LoadMetricsYaml(config.Center.MetricsYamlFile)
|
||||
cconf.LoadOpsYaml(config.Center.OpsYamlFile)
|
||||
cconf.LoadMetricsYaml(configDir, config.Center.MetricsYamlFile)
|
||||
cconf.LoadOpsYaml(configDir, config.Center.OpsYamlFile)
|
||||
|
||||
logxClean, err := logx.Init(config.Log)
|
||||
if err != nil {
|
||||
@@ -47,7 +47,7 @@ func Initialize(configDir string, cryptoKey string) (func(), error) {
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
ctx := ctx.NewContext(context.Background(), db)
|
||||
ctx := ctx.NewContext(context.Background(), db, true)
|
||||
models.InitRoot(ctx)
|
||||
|
||||
redis, err := storage.NewRedis(config.Redis)
|
||||
@@ -56,7 +56,7 @@ func Initialize(configDir string, cryptoKey string) (func(), error) {
|
||||
}
|
||||
|
||||
metas := metas.New(redis)
|
||||
idents := idents.New(db)
|
||||
idents := idents.New(ctx)
|
||||
|
||||
syncStats := memsto.NewSyncStats()
|
||||
alertStats := astats.NewSyncStats()
|
||||
@@ -73,12 +73,12 @@ func Initialize(configDir string, cryptoKey string) (func(), error) {
|
||||
promClients := prom.NewPromClient(ctx, config.Alert.Heartbeat)
|
||||
|
||||
externalProcessors := process.NewExternalProcessors()
|
||||
alert.Start(config.Alert, config.Pushgw, syncStats, alertStats, externalProcessors, targetCache, busiGroupCache, alertMuteCache, alertRuleCache, notifyConfigCache, dsCache, ctx, promClients, true)
|
||||
alert.Start(config.Alert, config.Pushgw, syncStats, alertStats, externalProcessors, targetCache, busiGroupCache, alertMuteCache, alertRuleCache, notifyConfigCache, dsCache, ctx, promClients)
|
||||
|
||||
writers := writer.NewWriters(config.Pushgw)
|
||||
|
||||
alertrtRouter := alertrt.New(config.HTTP, config.Alert, alertMuteCache, targetCache, busiGroupCache, alertStats, ctx, externalProcessors)
|
||||
centerRouter := centerrt.New(config.HTTP, config.Center, cconf.Operations, dsCache, notifyConfigCache, promClients, redis, sso, ctx, metas)
|
||||
centerRouter := centerrt.New(config.HTTP, config.Center, cconf.Operations, dsCache, notifyConfigCache, promClients, redis, sso, ctx, metas, targetCache)
|
||||
pushgwRouter := pushgwrt.New(config.HTTP, config.Pushgw, targetCache, busiGroupCache, idents, writers, ctx)
|
||||
|
||||
r := httpx.GinEngine(config.Global.RunMode, config.HTTP)
|
||||
|
||||
@@ -3,7 +3,6 @@ package router
|
||||
import (
|
||||
"fmt"
|
||||
"net/http"
|
||||
"path"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
@@ -11,15 +10,17 @@ import (
|
||||
"github.com/ccfos/nightingale/v6/center/cstats"
|
||||
"github.com/ccfos/nightingale/v6/center/metas"
|
||||
"github.com/ccfos/nightingale/v6/center/sso"
|
||||
_ "github.com/ccfos/nightingale/v6/front/statik"
|
||||
"github.com/ccfos/nightingale/v6/memsto"
|
||||
"github.com/ccfos/nightingale/v6/pkg/aop"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
"github.com/ccfos/nightingale/v6/pkg/httpx"
|
||||
"github.com/ccfos/nightingale/v6/prom"
|
||||
"github.com/ccfos/nightingale/v6/storage"
|
||||
"github.com/toolkits/pkg/runner"
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
"github.com/rakyll/statik/fs"
|
||||
"github.com/toolkits/pkg/logger"
|
||||
)
|
||||
|
||||
type Router struct {
|
||||
@@ -31,12 +32,13 @@ type Router struct {
|
||||
PromClients *prom.PromClientMap
|
||||
Redis storage.Redis
|
||||
MetaSet *metas.Set
|
||||
TargetCache *memsto.TargetCacheType
|
||||
Sso *sso.SsoClient
|
||||
Ctx *ctx.Context
|
||||
}
|
||||
|
||||
func New(httpConfig httpx.Config, center cconf.Center, operations cconf.Operation, ds *memsto.DatasourceCacheType, ncc *memsto.NotifyConfigCacheType,
|
||||
pc *prom.PromClientMap, redis storage.Redis, sso *sso.SsoClient, ctx *ctx.Context, metaSet *metas.Set) *Router {
|
||||
pc *prom.PromClientMap, redis storage.Redis, sso *sso.SsoClient, ctx *ctx.Context, metaSet *metas.Set, tc *memsto.TargetCacheType) *Router {
|
||||
return &Router{
|
||||
HTTP: httpConfig,
|
||||
Center: center,
|
||||
@@ -46,6 +48,7 @@ func New(httpConfig httpx.Config, center cconf.Center, operations cconf.Operatio
|
||||
PromClients: pc,
|
||||
Redis: redis,
|
||||
MetaSet: metaSet,
|
||||
TargetCache: tc,
|
||||
Sso: sso,
|
||||
Ctx: ctx,
|
||||
}
|
||||
@@ -86,32 +89,32 @@ func languageDetector(i18NHeaderKey string) gin.HandlerFunc {
|
||||
}
|
||||
}
|
||||
|
||||
func (rt *Router) configNoRoute(r *gin.Engine) {
|
||||
func (rt *Router) configNoRoute(r *gin.Engine, fs *http.FileSystem) {
|
||||
r.NoRoute(func(c *gin.Context) {
|
||||
arr := strings.Split(c.Request.URL.Path, ".")
|
||||
suffix := arr[len(arr)-1]
|
||||
|
||||
switch suffix {
|
||||
case "png", "jpeg", "jpg", "svg", "ico", "gif", "css", "js", "html", "htm", "gz", "zip", "map":
|
||||
cwdarr := []string{"/"}
|
||||
cwdarr = append(cwdarr, strings.Split(runner.Cwd, "/")...)
|
||||
cwdarr = append(cwdarr, "pub")
|
||||
cwdarr = append(cwdarr, strings.Split(c.Request.URL.Path, "/")...)
|
||||
c.File(path.Join(cwdarr...))
|
||||
c.FileFromFS(c.Request.URL.Path, *fs)
|
||||
default:
|
||||
cwdarr := []string{"/"}
|
||||
cwdarr = append(cwdarr, strings.Split(runner.Cwd, "/")...)
|
||||
cwdarr = append(cwdarr, "pub")
|
||||
cwdarr = append(cwdarr, "index.html")
|
||||
c.File(path.Join(cwdarr...))
|
||||
c.FileFromFS("/", *fs)
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
func (rt *Router) Config(r *gin.Engine) {
|
||||
|
||||
r.Use(stat())
|
||||
r.Use(languageDetector(rt.Center.I18NHeaderKey))
|
||||
r.Use(aop.Recovery())
|
||||
|
||||
statikFS, err := fs.New()
|
||||
if err != nil {
|
||||
logger.Errorf("cannot create statik fs: %v", err)
|
||||
}
|
||||
r.StaticFS("/pub", statikFS)
|
||||
|
||||
pagesPrefix := "/api/n9e"
|
||||
pages := r.Group(pagesPrefix)
|
||||
{
|
||||
@@ -139,6 +142,7 @@ func (rt *Router) Config(r *gin.Engine) {
|
||||
pages.GET("/auth/callback", rt.loginCallback)
|
||||
pages.GET("/auth/callback/cas", rt.loginCallbackCas)
|
||||
pages.GET("/auth/callback/oauth", rt.loginCallbackOAuth)
|
||||
pages.GET("/auth/perms", rt.allPerms)
|
||||
|
||||
pages.GET("/metrics/desc", rt.metricsDescGetFile)
|
||||
pages.POST("/metrics/desc", rt.metricsDescGetMap)
|
||||
@@ -294,7 +298,7 @@ func (rt *Router) Config(r *gin.Engine) {
|
||||
|
||||
pages.GET("/role/:id/ops", rt.auth(), rt.admin(), rt.operationOfRole)
|
||||
pages.PUT("/role/:id/ops", rt.auth(), rt.admin(), rt.roleBindOperation)
|
||||
pages.GET("operation", rt.operations)
|
||||
pages.GET("/operation", rt.operations)
|
||||
|
||||
pages.GET("/notify-tpls", rt.auth(), rt.admin(), rt.notifyTplGets)
|
||||
pages.PUT("/notify-tpl/content", rt.auth(), rt.admin(), rt.notifyTplUpdateContent)
|
||||
@@ -320,17 +324,20 @@ func (rt *Router) Config(r *gin.Engine) {
|
||||
pages.PUT("/notify-config", rt.auth(), rt.admin(), rt.notifyConfigPut)
|
||||
}
|
||||
|
||||
if rt.HTTP.Service.Enable {
|
||||
if rt.HTTP.APIForService.Enable {
|
||||
service := r.Group("/v1/n9e")
|
||||
if len(rt.HTTP.Service.BasicAuth) > 0 {
|
||||
service.Use(gin.BasicAuth(rt.HTTP.Service.BasicAuth))
|
||||
if len(rt.HTTP.APIForService.BasicAuth) > 0 {
|
||||
service.Use(gin.BasicAuth(rt.HTTP.APIForService.BasicAuth))
|
||||
}
|
||||
{
|
||||
service.Any("/prometheus/*url", rt.dsProxy)
|
||||
service.POST("/users", rt.userAddPost)
|
||||
service.GET("/users", rt.userFindAll)
|
||||
|
||||
service.GET("/targets", rt.targetGets)
|
||||
service.GET("/user-groups", rt.userGroupGetsByService)
|
||||
service.GET("/user-group-members", rt.userGroupMemberGetsByService)
|
||||
|
||||
service.GET("/targets", rt.targetGetsByService)
|
||||
service.GET("/targets/tags", rt.targetGetTags)
|
||||
service.POST("/targets/tags", rt.targetBindTagsByService)
|
||||
service.DELETE("/targets/tags", rt.targetUnbindTagsByService)
|
||||
@@ -342,36 +349,56 @@ func (rt *Router) Config(r *gin.Engine) {
|
||||
service.GET("/alert-rule/:arid", rt.alertRuleGet)
|
||||
service.GET("/alert-rules", rt.alertRulesGetByService)
|
||||
|
||||
service.GET("/alert-subscribes", rt.alertSubscribeGetsByService)
|
||||
|
||||
service.GET("/busi-groups", rt.busiGroupGetsByService)
|
||||
|
||||
service.GET("/datasources", rt.datasourceGetsByService)
|
||||
service.GET("/datasource-ids", rt.getDatasourceIds)
|
||||
service.POST("/server-heartbeat", rt.serverHeartbeat)
|
||||
service.GET("/servers-active", rt.serversActive)
|
||||
|
||||
service.GET("/recording-rules", rt.recordingRuleGetsByService)
|
||||
|
||||
service.GET("/alert-mutes", rt.alertMuteGets)
|
||||
service.POST("/alert-mutes", rt.alertMuteAddByService)
|
||||
service.DELETE("/alert-mutes", rt.alertMuteDel)
|
||||
|
||||
service.GET("/alert-cur-events", rt.alertCurEventsList)
|
||||
service.GET("/alert-cur-events-get-by-rid", rt.alertCurEventsGetByRid)
|
||||
service.GET("/alert-his-events", rt.alertHisEventsList)
|
||||
service.GET("/alert-his-event/:eid", rt.alertHisEventGet)
|
||||
|
||||
service.GET("/config/:id", rt.configGet)
|
||||
service.GET("/configs", rt.configsGet)
|
||||
service.GET("/config", rt.configGetByKey)
|
||||
service.PUT("/configs", rt.configsPut)
|
||||
service.POST("/configs", rt.configsPost)
|
||||
service.DELETE("/configs", rt.configsDel)
|
||||
|
||||
service.POST("/conf-prop/encrypt", rt.confPropEncrypt)
|
||||
service.POST("/conf-prop/decrypt", rt.confPropDecrypt)
|
||||
|
||||
service.GET("/statistic", rt.statistic)
|
||||
|
||||
service.GET("/notify-tpls", rt.notifyTplGets)
|
||||
|
||||
service.POST("/task-record-add", rt.taskRecordAdd)
|
||||
}
|
||||
}
|
||||
|
||||
if rt.HTTP.Heartbeat.Enable {
|
||||
if rt.HTTP.APIForAgent.Enable {
|
||||
heartbeat := r.Group("/v1/n9e")
|
||||
{
|
||||
if len(rt.HTTP.Heartbeat.BasicAuth) > 0 {
|
||||
heartbeat.Use(gin.BasicAuth(rt.HTTP.Heartbeat.BasicAuth))
|
||||
if len(rt.HTTP.APIForAgent.BasicAuth) > 0 {
|
||||
heartbeat.Use(gin.BasicAuth(rt.HTTP.APIForAgent.BasicAuth))
|
||||
}
|
||||
heartbeat.POST("/heartbeat", rt.heartbeat)
|
||||
}
|
||||
}
|
||||
|
||||
rt.configNoRoute(r)
|
||||
rt.configNoRoute(r, &statikFS)
|
||||
|
||||
}
|
||||
|
||||
func Render(c *gin.Context, data, msg interface{}) {
|
||||
|
||||
@@ -128,6 +128,13 @@ func (rt *Router) alertCurEventsCardDetails(c *gin.Context) {
|
||||
ginx.NewRender(c).Data(list, err)
|
||||
}
|
||||
|
||||
// alertCurEventsGetByRid
|
||||
func (rt *Router) alertCurEventsGetByRid(c *gin.Context) {
|
||||
rid := ginx.QueryInt64(c, "rid")
|
||||
dsId := ginx.QueryInt64(c, "dsid")
|
||||
ginx.NewRender(c).Data(models.AlertCurEventGetByRuleIdAndDsId(rt.Ctx, rid, dsId))
|
||||
}
|
||||
|
||||
// 列表方式,拉取活跃告警
|
||||
func (rt *Router) alertCurEventsList(c *gin.Context) {
|
||||
stime, etime := getTimeRange(c)
|
||||
|
||||
@@ -27,7 +27,12 @@ func (rt *Router) alertRuleGets(c *gin.Context) {
|
||||
}
|
||||
|
||||
func (rt *Router) alertRulesGetByService(c *gin.Context) {
|
||||
prods := strings.Split(ginx.QueryStr(c, "prods", ""), ",")
|
||||
prods := []string{}
|
||||
prodStr := ginx.QueryStr(c, "prods", "")
|
||||
if prodStr != "" {
|
||||
prods = strings.Split(ginx.QueryStr(c, "prods", ""), ",")
|
||||
}
|
||||
|
||||
query := ginx.QueryStr(c, "query", "")
|
||||
algorithm := ginx.QueryStr(c, "algorithm", "")
|
||||
cluster := ginx.QueryStr(c, "cluster", "")
|
||||
|
||||
@@ -110,3 +110,8 @@ func (rt *Router) alertSubscribeDel(c *gin.Context) {
|
||||
|
||||
ginx.NewRender(c).Message(models.AlertSubscribeDel(rt.Ctx, f.Ids))
|
||||
}
|
||||
|
||||
func (rt *Router) alertSubscribeGetsByService(c *gin.Context) {
|
||||
lst, err := models.AlertSubscribeGetsByService(rt.Ctx)
|
||||
ginx.NewRender(c).Data(lst, err)
|
||||
}
|
||||
|
||||
@@ -123,6 +123,11 @@ func (rt *Router) busiGroupGets(c *gin.Context) {
|
||||
ginx.NewRender(c).Data(lst, err)
|
||||
}
|
||||
|
||||
func (rt *Router) busiGroupGetsByService(c *gin.Context) {
|
||||
lst, err := models.BusiGroupGetAll(rt.Ctx)
|
||||
ginx.NewRender(c).Data(lst, err)
|
||||
}
|
||||
|
||||
// 这个接口只有在活跃告警页面才调用,获取各个BG的活跃告警数量
|
||||
func (rt *Router) busiGroupAlertingsGets(c *gin.Context) {
|
||||
ids := ginx.QueryStr(c, "ids", "")
|
||||
|
||||
@@ -20,6 +20,11 @@ func (rt *Router) configGet(c *gin.Context) {
|
||||
ginx.NewRender(c).Data(configs, err)
|
||||
}
|
||||
|
||||
func (rt *Router) configGetByKey(c *gin.Context) {
|
||||
config, err := models.ConfigsGet(rt.Ctx, ginx.QueryStr(c, "key"))
|
||||
ginx.NewRender(c).Data(config, err)
|
||||
}
|
||||
|
||||
func (rt *Router) configsDel(c *gin.Context) {
|
||||
var f idsForm
|
||||
ginx.BindJSON(c, &f)
|
||||
|
||||
@@ -5,6 +5,7 @@ import (
|
||||
"fmt"
|
||||
"net/http"
|
||||
"net/url"
|
||||
"strings"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
|
||||
@@ -35,6 +36,12 @@ func (rt *Router) datasourceList(c *gin.Context) {
|
||||
Render(c, list, err)
|
||||
}
|
||||
|
||||
func (rt *Router) datasourceGetsByService(c *gin.Context) {
|
||||
typ := ginx.QueryStr(c, "typ", "")
|
||||
lst, err := models.GetDatasourcesGetsBy(rt.Ctx, typ, "", "", "")
|
||||
ginx.NewRender(c).Data(lst, err)
|
||||
}
|
||||
|
||||
type datasourceBrief struct {
|
||||
Id int64 `json:"id"`
|
||||
Name string `json:"name"`
|
||||
@@ -116,7 +123,13 @@ func DatasourceCheck(ds models.Datasource) error {
|
||||
if ds.PluginType == models.PROMETHEUS {
|
||||
subPath := "/api/v1/query"
|
||||
query := url.Values{}
|
||||
query.Add("query", "1+1")
|
||||
if strings.Contains(fullURL, "loki") {
|
||||
subPath = "/api/v1/labels"
|
||||
query.Add("start", "1")
|
||||
query.Add("end", "2")
|
||||
} else {
|
||||
query.Add("query", "1+1")
|
||||
}
|
||||
fullURL = fmt.Sprintf("%s%s?%s", ds.HTTPJson.Url, subPath, query.Encode())
|
||||
|
||||
req, err = http.NewRequest("POST", fullURL, nil)
|
||||
@@ -172,6 +185,13 @@ func (rt *Router) datasourceDel(c *gin.Context) {
|
||||
Render(c, nil, err)
|
||||
}
|
||||
|
||||
func (rt *Router) getDatasourceIds(c *gin.Context) {
|
||||
name := ginx.QueryStr(c, "name")
|
||||
datasourceIds, err := models.GetDatasourceIdsByEngineName(rt.Ctx, name)
|
||||
|
||||
ginx.NewRender(c).Data(datasourceIds, err)
|
||||
}
|
||||
|
||||
func Username(c *gin.Context) string {
|
||||
|
||||
return c.MustGet("username").(string)
|
||||
|
||||
@@ -17,6 +17,41 @@ import (
|
||||
|
||||
const defaultLimit = 300
|
||||
|
||||
func (rt *Router) statistic(c *gin.Context) {
|
||||
name := ginx.QueryStr(c, "name")
|
||||
var model interface{}
|
||||
var err error
|
||||
var statistics *models.Statistics
|
||||
switch name {
|
||||
case "alert_mute":
|
||||
model = models.AlertMute{}
|
||||
case "alert_rule":
|
||||
model = models.AlertRule{}
|
||||
case "alert_subscribe":
|
||||
model = models.AlertSubscribe{}
|
||||
case "busi_group":
|
||||
model = models.BusiGroup{}
|
||||
case "recording_rule":
|
||||
model = models.RecordingRule{}
|
||||
case "target":
|
||||
model = models.Target{}
|
||||
case "user":
|
||||
model = models.User{}
|
||||
case "user_group":
|
||||
model = models.UserGroup{}
|
||||
case "datasource":
|
||||
// datasource update_at is different from others
|
||||
statistics, err = models.DatasourceStatistics(rt.Ctx)
|
||||
ginx.NewRender(c).Data(statistics, err)
|
||||
return
|
||||
default:
|
||||
ginx.Bomb(http.StatusBadRequest, "invalid name")
|
||||
}
|
||||
|
||||
statistics, err = models.StatisticsGet(rt.Ctx, model)
|
||||
ginx.NewRender(c).Data(statistics, err)
|
||||
}
|
||||
|
||||
func queryDatasourceIds(c *gin.Context) []int64 {
|
||||
datasourceIds := ginx.QueryStr(c, "datasource_ids", "")
|
||||
datasourceIds = strings.ReplaceAll(datasourceIds, ",", " ")
|
||||
|
||||
@@ -36,6 +36,17 @@ func (rt *Router) heartbeat(c *gin.Context) {
|
||||
ginx.Dangerous(err)
|
||||
|
||||
req.Offset = (time.Now().UnixMilli() - req.UnixTime)
|
||||
req.RemoteAddr = c.ClientIP()
|
||||
rt.MetaSet.Set(req.Hostname, req)
|
||||
ginx.NewRender(c).Message(nil)
|
||||
|
||||
gid := ginx.QueryInt64(c, "gid", 0)
|
||||
|
||||
if gid != 0 {
|
||||
target, has := rt.TargetCache.Get(req.Hostname)
|
||||
if has && target.GroupId != gid {
|
||||
err = models.TargetUpdateBgid(rt.Ctx, []string{req.Hostname}, gid, false)
|
||||
}
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Message(err)
|
||||
}
|
||||
|
||||
@@ -21,7 +21,7 @@ func (rt *Router) alertMuteGetsByBG(c *gin.Context) {
|
||||
|
||||
func (rt *Router) alertMuteGets(c *gin.Context) {
|
||||
prods := strings.Fields(ginx.QueryStr(c, "prods", ""))
|
||||
bgid := ginx.QueryInt64(c, "bgid", 0)
|
||||
bgid := ginx.QueryInt64(c, "bgid", -1)
|
||||
query := ginx.QueryStr(c, "query", "")
|
||||
lst, err := models.AlertMuteGets(rt.Ctx, prods, bgid, query)
|
||||
|
||||
|
||||
@@ -30,9 +30,12 @@ func (rt *Router) webhookPuts(c *gin.Context) {
|
||||
var webhooks []models.Webhook
|
||||
ginx.BindJSON(c, &webhooks)
|
||||
for i := 0; i < len(webhooks); i++ {
|
||||
for k, v := range webhooks[i].HeaderMap {
|
||||
webhooks[i].Headers = append(webhooks[i].Headers, k)
|
||||
webhooks[i].Headers = append(webhooks[i].Headers, v)
|
||||
webhooks[i].Headers = []string{}
|
||||
if len(webhooks[i].HeaderMap) > 0 {
|
||||
for k, v := range webhooks[i].HeaderMap {
|
||||
webhooks[i].Headers = append(webhooks[i].Headers, k)
|
||||
webhooks[i].Headers = append(webhooks[i].Headers, v)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -3,7 +3,9 @@ package router
|
||||
import (
|
||||
"bytes"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"html/template"
|
||||
"strings"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/center/cconf"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
@@ -22,6 +24,11 @@ func (rt *Router) notifyTplUpdateContent(c *gin.Context) {
|
||||
var f models.NotifyTpl
|
||||
ginx.BindJSON(c, &f)
|
||||
|
||||
if err := templateValidate(f); err != nil {
|
||||
ginx.NewRender(c).Message(err.Error())
|
||||
return
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Message(f.UpdateContent(rt.Ctx))
|
||||
}
|
||||
|
||||
@@ -29,9 +36,32 @@ func (rt *Router) notifyTplUpdate(c *gin.Context) {
|
||||
var f models.NotifyTpl
|
||||
ginx.BindJSON(c, &f)
|
||||
|
||||
if err := templateValidate(f); err != nil {
|
||||
ginx.NewRender(c).Message(err.Error())
|
||||
return
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Message(f.Update(rt.Ctx))
|
||||
}
|
||||
|
||||
func templateValidate(f models.NotifyTpl) error {
|
||||
if f.Content == "" {
|
||||
return nil
|
||||
}
|
||||
|
||||
var defs = []string{
|
||||
"{{$labels := .TagsMap}}",
|
||||
"{{$value := .TriggerValue}}",
|
||||
}
|
||||
text := strings.Join(append(defs, f.Content), "")
|
||||
|
||||
if _, err := template.New(f.Channel).Funcs(tplx.TemplateFuncMap).Parse(text); err != nil {
|
||||
return fmt.Errorf("notify template verify illegal:%s", err.Error())
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
func (rt *Router) notifyTplPreview(c *gin.Context) {
|
||||
var event models.AlertCurEvent
|
||||
err := json.Unmarshal([]byte(cconf.EVENT_EXAMPLE), &event)
|
||||
@@ -43,9 +73,29 @@ func (rt *Router) notifyTplPreview(c *gin.Context) {
|
||||
var f models.NotifyTpl
|
||||
ginx.BindJSON(c, &f)
|
||||
|
||||
tpl, err := template.New(f.Channel).Funcs(tplx.TemplateFuncMap).Parse(f.Content)
|
||||
var defs = []string{
|
||||
"{{$labels := .TagsMap}}",
|
||||
"{{$value := .TriggerValue}}",
|
||||
}
|
||||
text := strings.Join(append(defs, f.Content), "")
|
||||
tpl, err := template.New(f.Channel).Funcs(tplx.TemplateFuncMap).Parse(text)
|
||||
ginx.Dangerous(err)
|
||||
|
||||
event.TagsMap = make(map[string]string)
|
||||
for i := 0; i < len(event.TagsJSON); i++ {
|
||||
pair := strings.TrimSpace(event.TagsJSON[i])
|
||||
if pair == "" {
|
||||
continue
|
||||
}
|
||||
|
||||
arr := strings.Split(pair, "=")
|
||||
if len(arr) != 2 {
|
||||
continue
|
||||
}
|
||||
|
||||
event.TagsMap[arr[0]] = arr[1]
|
||||
}
|
||||
|
||||
var body bytes.Buffer
|
||||
var ret string
|
||||
if err := tpl.Execute(&body, event); err != nil {
|
||||
|
||||
@@ -19,6 +19,11 @@ func (rt *Router) recordingRuleGets(c *gin.Context) {
|
||||
ginx.NewRender(c).Data(ars, err)
|
||||
}
|
||||
|
||||
func (rt *Router) recordingRuleGetsByService(c *gin.Context) {
|
||||
ars, err := models.RecordingRuleEnabledGets(rt.Ctx)
|
||||
ginx.NewRender(c).Data(ars, err)
|
||||
}
|
||||
|
||||
func (rt *Router) recordingRuleGet(c *gin.Context) {
|
||||
rrid := ginx.UrlParamInt64(c, "rrid")
|
||||
|
||||
|
||||
@@ -83,3 +83,18 @@ func (rt *Router) roleGets(c *gin.Context) {
|
||||
lst, err := models.RoleGetsAll(rt.Ctx)
|
||||
ginx.NewRender(c).Data(lst, err)
|
||||
}
|
||||
|
||||
func (rt *Router) allPerms(c *gin.Context) {
|
||||
roles, err := models.RoleGetsAll(rt.Ctx)
|
||||
ginx.Dangerous(err)
|
||||
m := make(map[string][]string)
|
||||
for _, r := range roles {
|
||||
lst, err := models.OperationsOfRole(rt.Ctx, strings.Fields(r.Name))
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
m[r.Name] = lst
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Data(m, err)
|
||||
}
|
||||
|
||||
@@ -1,6 +1,8 @@
|
||||
package router
|
||||
|
||||
import (
|
||||
"time"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
@@ -16,3 +18,17 @@ func (rt *Router) serverClustersGet(c *gin.Context) {
|
||||
list, err := models.AlertingEngineGetsClusters(rt.Ctx, "")
|
||||
ginx.NewRender(c).Data(list, err)
|
||||
}
|
||||
|
||||
func (rt *Router) serverHeartbeat(c *gin.Context) {
|
||||
var req models.HeartbeatInfo
|
||||
ginx.BindJSON(c, &req)
|
||||
err := models.AlertingEngineHeartbeatWithCluster(rt.Ctx, req.Instance, req.EngineCluster, req.DatasourceId)
|
||||
ginx.NewRender(c).Message(err)
|
||||
}
|
||||
|
||||
func (rt *Router) serversActive(c *gin.Context) {
|
||||
datasourceId := ginx.QueryInt64(c, "dsid")
|
||||
|
||||
servers, err := models.AlertingEngineGetsInstances(rt.Ctx, "datasource_id = ? and clock > ?", datasourceId, time.Now().Unix()-30)
|
||||
ginx.NewRender(c).Data(servers, err)
|
||||
}
|
||||
|
||||
@@ -101,6 +101,11 @@ func (rt *Router) targetGets(c *gin.Context) {
|
||||
}, nil)
|
||||
}
|
||||
|
||||
func (rt *Router) targetGetsByService(c *gin.Context) {
|
||||
lst, err := models.TargetGetsAll(rt.Ctx)
|
||||
ginx.NewRender(c).Data(lst, err)
|
||||
}
|
||||
|
||||
func (rt *Router) targetGetTags(c *gin.Context) {
|
||||
idents := ginx.QueryStr(c, "idents", "")
|
||||
idents = strings.ReplaceAll(idents, ",", " ")
|
||||
|
||||
@@ -120,6 +120,12 @@ func (f *taskForm) HandleFH(fh string) {
|
||||
f.Title = f.Title + " FH: " + fh
|
||||
}
|
||||
|
||||
func (rt *Router) taskRecordAdd(c *gin.Context) {
|
||||
var f *models.TaskRecord
|
||||
ginx.BindJSON(c, &f)
|
||||
ginx.NewRender(c).Message(f.Add(rt.Ctx))
|
||||
}
|
||||
|
||||
func (rt *Router) taskAdd(c *gin.Context) {
|
||||
var f taskForm
|
||||
ginx.BindJSON(c, &f)
|
||||
|
||||
@@ -12,19 +12,8 @@ import (
|
||||
)
|
||||
|
||||
func (rt *Router) userFindAll(c *gin.Context) {
|
||||
limit := ginx.QueryInt(c, "limit", 20)
|
||||
query := ginx.QueryStr(c, "query", "")
|
||||
|
||||
total, err := models.UserTotal(rt.Ctx, query)
|
||||
ginx.Dangerous(err)
|
||||
|
||||
list, err := models.UserGets(rt.Ctx, query, limit, ginx.Offset(c, limit))
|
||||
ginx.Dangerous(err)
|
||||
|
||||
ginx.NewRender(c).Data(gin.H{
|
||||
"list": list,
|
||||
"total": total,
|
||||
}, nil)
|
||||
list, err := models.UserGetAll(rt.Ctx)
|
||||
ginx.NewRender(c).Data(list, err)
|
||||
}
|
||||
|
||||
func (rt *Router) userGets(c *gin.Context) {
|
||||
|
||||
@@ -29,6 +29,17 @@ func (rt *Router) userGroupGets(c *gin.Context) {
|
||||
ginx.NewRender(c).Data(lst, err)
|
||||
}
|
||||
|
||||
func (rt *Router) userGroupGetsByService(c *gin.Context) {
|
||||
lst, err := models.UserGroupGetAll(rt.Ctx)
|
||||
ginx.NewRender(c).Data(lst, err)
|
||||
}
|
||||
|
||||
// user group member get by service
|
||||
func (rt *Router) userGroupMemberGetsByService(c *gin.Context) {
|
||||
members, err := models.UserGroupMemberGetAll(rt.Ctx)
|
||||
ginx.NewRender(c).Data(members, err)
|
||||
}
|
||||
|
||||
type userGroupForm struct {
|
||||
Name string `json:"name" binding:"required"`
|
||||
Note string `json:"note"`
|
||||
|
||||
@@ -18,7 +18,7 @@ func Upgrade(configFile string) error {
|
||||
return err
|
||||
}
|
||||
|
||||
ctx := ctx.NewContext(context.Background(), db)
|
||||
ctx := ctx.NewContext(context.Background(), db, false)
|
||||
for _, cluster := range config.Clusters {
|
||||
count, err := models.GetDatasourcesCountBy(ctx, "", "", cluster.Name)
|
||||
if err != nil {
|
||||
|
||||
@@ -41,6 +41,9 @@ alter table `alert_his_event` add annotations text not null comment 'annotations
|
||||
alter table `alert_his_event` add rule_config text not null comment 'rule_config';
|
||||
|
||||
alter table `alerting_engines` add datasource_id bigint unsigned not null default 0;
|
||||
alter table `alerting_engines` change cluster engine_cluster varchar(128) not null default '' comment 'n9e engine cluster';
|
||||
|
||||
alter table `task_record` add event_id bigint not null comment 'event id' default 0;
|
||||
|
||||
CREATE TABLE `datasource`
|
||||
(
|
||||
|
||||
61
conf/conf.go
@@ -2,6 +2,7 @@ package conf
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"net"
|
||||
"os"
|
||||
"strings"
|
||||
|
||||
@@ -13,20 +14,28 @@ import (
|
||||
"github.com/ccfos/nightingale/v6/pkg/ormx"
|
||||
"github.com/ccfos/nightingale/v6/pushgw/pconf"
|
||||
"github.com/ccfos/nightingale/v6/storage"
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
)
|
||||
|
||||
type ConfigType struct {
|
||||
Global GlobalConfig
|
||||
Log logx.Config
|
||||
HTTP httpx.Config
|
||||
DB ormx.DBConfig
|
||||
Redis storage.RedisConfig
|
||||
Global GlobalConfig
|
||||
Log logx.Config
|
||||
HTTP httpx.Config
|
||||
DB ormx.DBConfig
|
||||
Redis storage.RedisConfig
|
||||
CenterApi CenterApi
|
||||
|
||||
Pushgw pconf.Pushgw
|
||||
Alert aconf.Alert
|
||||
Center cconf.Center
|
||||
}
|
||||
|
||||
type CenterApi struct {
|
||||
Addrs []string
|
||||
BasicAuth gin.Accounts
|
||||
}
|
||||
|
||||
type GlobalConfig struct {
|
||||
RunMode string
|
||||
}
|
||||
@@ -49,28 +58,36 @@ func InitConfig(configDir, cryptoKey string) (*ConfigType, error) {
|
||||
|
||||
if config.Alert.Heartbeat.IP == "" {
|
||||
// auto detect
|
||||
// config.Alert.Heartbeat.IP = fmt.Sprint(GetOutboundIP())
|
||||
// 自动获取IP在有些环境下容易出错,这里用hostname+pid来作唯一标识
|
||||
config.Alert.Heartbeat.IP = fmt.Sprint(GetOutboundIP())
|
||||
if config.Alert.Heartbeat.IP == "" {
|
||||
hostname, err := os.Hostname()
|
||||
if err != nil {
|
||||
fmt.Println("failed to get hostname:", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
hostname, err := os.Hostname()
|
||||
if err != nil {
|
||||
fmt.Println("failed to get hostname:", err)
|
||||
os.Exit(1)
|
||||
if strings.Contains(hostname, "localhost") {
|
||||
fmt.Println("Warning! hostname contains substring localhost, setting a more unique hostname is recommended")
|
||||
}
|
||||
|
||||
config.Alert.Heartbeat.IP = hostname
|
||||
}
|
||||
|
||||
if strings.Contains(hostname, "localhost") {
|
||||
fmt.Println("Warning! hostname contains substring localhost, setting a more unique hostname is recommended")
|
||||
}
|
||||
|
||||
config.Alert.Heartbeat.IP = hostname
|
||||
|
||||
// if config.Alert.Heartbeat.IP == "" {
|
||||
// fmt.Println("heartbeat ip auto got is blank")
|
||||
// os.Exit(1)
|
||||
// }
|
||||
}
|
||||
|
||||
config.Alert.Heartbeat.Endpoint = fmt.Sprintf("%s:%d", config.Alert.Heartbeat.IP, config.HTTP.Port)
|
||||
|
||||
return config, nil
|
||||
}
|
||||
|
||||
func GetOutboundIP() net.IP {
|
||||
conn, err := net.Dial("udp", "223.5.5.5:80")
|
||||
if err != nil {
|
||||
fmt.Println("auto get outbound ip fail:", err)
|
||||
return []byte{}
|
||||
}
|
||||
defer conn.Close()
|
||||
|
||||
localAddr := conn.LocalAddr().(*net.UDPAddr)
|
||||
|
||||
return localAddr.IP
|
||||
}
|
||||
|
||||
@@ -14,39 +14,22 @@ func decryptConfig(config *ConfigType, cryptoKey string) error {
|
||||
|
||||
config.DB.DSN = decryptDsn
|
||||
|
||||
for k := range config.HTTP.Alert.BasicAuth {
|
||||
decryptPwd, err := secu.DealWithDecrypt(config.HTTP.Alert.BasicAuth[k], cryptoKey)
|
||||
for k := range config.HTTP.APIForService.BasicAuth {
|
||||
decryptPwd, err := secu.DealWithDecrypt(config.HTTP.APIForService.BasicAuth[k], cryptoKey)
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to decrypt http basic auth password: %s", err)
|
||||
}
|
||||
|
||||
config.HTTP.Alert.BasicAuth[k] = decryptPwd
|
||||
config.HTTP.APIForService.BasicAuth[k] = decryptPwd
|
||||
}
|
||||
|
||||
for k := range config.HTTP.Pushgw.BasicAuth {
|
||||
decryptPwd, err := secu.DealWithDecrypt(config.HTTP.Pushgw.BasicAuth[k], cryptoKey)
|
||||
for k := range config.HTTP.APIForAgent.BasicAuth {
|
||||
decryptPwd, err := secu.DealWithDecrypt(config.HTTP.APIForAgent.BasicAuth[k], cryptoKey)
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to decrypt http basic auth password: %s", err)
|
||||
}
|
||||
|
||||
config.HTTP.Pushgw.BasicAuth[k] = decryptPwd
|
||||
}
|
||||
|
||||
for k := range config.HTTP.Heartbeat.BasicAuth {
|
||||
decryptPwd, err := secu.DealWithDecrypt(config.HTTP.Heartbeat.BasicAuth[k], cryptoKey)
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to decrypt http basic auth password: %s", err)
|
||||
}
|
||||
|
||||
config.HTTP.Heartbeat.BasicAuth[k] = decryptPwd
|
||||
}
|
||||
|
||||
for k := range config.HTTP.Service.BasicAuth {
|
||||
decryptPwd, err := secu.DealWithDecrypt(config.HTTP.Service.BasicAuth[k], cryptoKey)
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to decrypt http basic auth password: %s", err)
|
||||
}
|
||||
config.HTTP.Service.BasicAuth[k] = decryptPwd
|
||||
config.HTTP.APIForAgent.BasicAuth[k] = decryptPwd
|
||||
}
|
||||
|
||||
for i, v := range config.Pushgw.Writers {
|
||||
|
||||
147
doc/README.bak.md
Normal file
@@ -0,0 +1,147 @@
|
||||
<p align="center">
|
||||
<a href="https://github.com/ccfos/nightingale">
|
||||
<img src="doc/img/nightingale_logo_h.png" alt="nightingale - cloud native monitoring" width="240" /></a>
|
||||
</p>
|
||||
|
||||
<p align="center">
|
||||
<img alt="GitHub latest release" src="https://img.shields.io/github/v/release/ccfos/nightingale"/>
|
||||
<a href="https://n9e.github.io">
|
||||
<img alt="Docs" src="https://img.shields.io/badge/docs-get%20started-brightgreen"/></a>
|
||||
<a href="https://hub.docker.com/u/flashcatcloud">
|
||||
<img alt="Docker pulls" src="https://img.shields.io/docker/pulls/flashcatcloud/nightingale"/></a>
|
||||
<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/ccfos/nightingale">
|
||||
<img alt="GitHub Repo issues" src="https://img.shields.io/github/issues/ccfos/nightingale">
|
||||
<img alt="GitHub Repo issues closed" src="https://img.shields.io/github/issues-closed/ccfos/nightingale">
|
||||
<img alt="GitHub forks" src="https://img.shields.io/github/forks/ccfos/nightingale">
|
||||
<a href="https://github.com/ccfos/nightingale/graphs/contributors">
|
||||
<img alt="GitHub contributors" src="https://img.shields.io/github/contributors-anon/ccfos/nightingale"/></a>
|
||||
<a href="https://n9e-talk.slack.com/">
|
||||
<img alt="GitHub contributors" src="https://img.shields.io/badge/join%20slack-%23n9e-brightgreen.svg"/></a>
|
||||
<img alt="License" src="https://img.shields.io/badge/license-Apache--2.0-blue"/>
|
||||
</p>
|
||||
<p align="center">
|
||||
<b>All-in-one</b> 的开源观测平台 <br/>
|
||||
<b>开箱即用</b>,集数据采集、可视化、监控告警于一体 <br/>
|
||||
推荐升级您的 <b>Prometheus + AlertManager + Grafana + ELK + Jaeger</b> 组合方案到夜莺!
|
||||
</p>
|
||||
|
||||
[English](./README_en.md) | [中文](./README.md)
|
||||
|
||||
|
||||
|
||||
## 功能和特点
|
||||
|
||||
- **开箱即用**
|
||||
- 支持 Docker、Helm Chart、云服务等多种部署方式,集数据采集、监控告警、可视化为一体,内置多种监控仪表盘、快捷视图、告警规则模板,导入即可快速使用,**大幅降低云原生监控系统的建设成本、学习成本、使用成本**;
|
||||
- **专业告警**
|
||||
- 可视化的告警配置和管理,支持丰富的告警规则,提供屏蔽规则、订阅规则的配置能力,支持告警多种送达渠道,支持告警自愈、告警事件管理等;
|
||||
- **推荐您使用夜莺的同时,无缝搭配[FlashDuty](https://flashcat.cloud/product/flashcat-duty/),实现告警聚合收敛、认领、升级、排班、协同,让告警的触达既高效,又确保告警处理不遗漏、做到件件有回响**。
|
||||
- **云原生**
|
||||
- 以交钥匙的方式快速构建企业级的云原生监控体系,支持 [Categraf](https://github.com/flashcatcloud/categraf)、Telegraf、Grafana-agent 等多种采集器,支持 Prometheus、VictoriaMetrics、M3DB、ElasticSearch、Jaeger 等多种数据源,兼容支持导入 Grafana 仪表盘,**与云原生生态无缝集成**;
|
||||
- **高性能 高可用**
|
||||
- 得益于夜莺的多数据源管理引擎,和夜莺引擎侧优秀的架构设计,借助于高性能时序库,可以满足数亿时间线的采集、存储、告警分析场景,节省大量成本;
|
||||
- 夜莺监控组件均可水平扩展,无单点,已在上千家企业部署落地,经受了严苛的生产实践检验。众多互联网头部公司,夜莺集群机器达百台,处理数亿级时间线,重度使用夜莺监控;
|
||||
- **灵活扩展 中心化管理**
|
||||
- 夜莺监控,可部署在 1 核 1G 的云主机,可在上百台机器集群化部署,可运行在 K8s 中;也可将时序库、告警引擎等组件下沉到各机房、各 Region,兼顾边缘部署和中心化统一管理,**解决数据割裂,缺乏统一视图的难题**;
|
||||
- **开放社区**
|
||||
- 托管于[中国计算机学会开源发展委员会](https://www.ccf.org.cn/kyfzwyh/),有[快猫星云](https://flashcat.cloud)和众多公司的持续投入,和数千名社区用户的积极参与,以及夜莺监控项目清晰明确的定位,都保证了夜莺开源社区健康、长久的发展。活跃、专业的社区用户也在持续迭代和沉淀更多的最佳实践于产品中;
|
||||
|
||||
## 使用场景
|
||||
1. **如果您希望在一个平台中,统一管理和查看 Metrics、Logging、Tracing 数据,推荐你使用夜莺**:
|
||||
- 请参考阅读:[不止于监控,夜莺 V6 全新升级为开源观测平台](http://flashcat.cloud/blog/nightingale-v6-release/)
|
||||
2. **如果您在使用 Prometheus 过程中,有以下的一个或者多个需求场景,推荐您无缝升级到夜莺**:
|
||||
- Prometheus、Alertmanager、Grafana 等多个系统较为割裂,缺乏统一视图,无法开箱即用;
|
||||
- 通过修改配置文件来管理 Prometheus、Alertmanager 的方式,学习曲线大,协同有难度;
|
||||
- 数据量过大而无法扩展您的 Prometheus 集群;
|
||||
- 生产环境运行多套 Prometheus 集群,面临管理和使用成本高的问题;
|
||||
3. **如果您在使用 Zabbix,有以下的场景,推荐您升级到夜莺**:
|
||||
- 监控的数据量太大,希望有更好的扩展解决方案;
|
||||
- 学习曲线高,多人多团队模式下,希望有更好的协同使用效率;
|
||||
- 微服务和云原生架构下,监控数据的生命周期多变、监控数据维度基数高,Zabbix 数据模型不易适配;
|
||||
- 了解更多Zabbix和夜莺监控的对比,推荐您进一步阅读[Zabbix 和夜莺监控选型对比](https://flashcat.cloud/blog/zabbx-vs-nightingale/)
|
||||
4. **如果您在使用 [Open-Falcon](https://github.com/open-falcon/falcon-plus),我们推荐您升级到夜莺:**
|
||||
- 关于 Open-Falcon 和夜莺的详细介绍,请参考阅读:[云原生监控的十个特点和趋势](http://flashcat.cloud/blog/10-trends-of-cloudnative-monitoring/)
|
||||
- 监控系统和可观测平台的区别,请参考阅读:[从监控系统到可观测平台,Gap有多大
|
||||
](https://flashcat.cloud/blog/gap-of-monitoring-to-o11y/)
|
||||
5. **我们推荐您使用 [Categraf](https://github.com/flashcatcloud/categraf) 作为首选的监控数据采集器**:
|
||||
- [Categraf](https://github.com/flashcatcloud/categraf) 是夜莺监控的默认采集器,采用开放插件机制和 All-in-one 的设计理念,同时支持 metric、log、trace、event 的采集。Categraf 不仅可以采集 CPU、内存、网络等系统层面的指标,也集成了众多开源组件的采集能力,支持K8s生态。Categraf 内置了对应的仪表盘和告警规则,开箱即用。
|
||||
|
||||
## 文档
|
||||
|
||||
[English Doc](https://n9e.github.io/) | [中文文档](https://flashcat.cloud/docs/)
|
||||
|
||||
## 产品示意图
|
||||
|
||||
https://user-images.githubusercontent.com/792850/216888712-2565fcea-9df5-47bd-a49e-d60af9bd76e8.mp4
|
||||
|
||||
## 夜莺架构
|
||||
|
||||
夜莺监控可以接收各种采集器上报的监控数据(比如 [Categraf](https://github.com/flashcatcloud/categraf)、telegraf、grafana-agent、Prometheus),并写入多种流行的时序数据库中(可以支持Prometheus、M3DB、VictoriaMetrics、Thanos、TDEngine等),提供告警规则、屏蔽规则、订阅规则的配置能力,提供监控数据的查看能力,提供告警自愈机制(告警触发之后自动回调某个webhook地址或者执行某个脚本),提供历史告警事件的存储管理、分组查看的能力。
|
||||
|
||||
### 中心汇聚式部署方案
|
||||
|
||||

|
||||
|
||||
夜莺只有一个模块,就是 n9e,可以部署多个 n9e 实例组成集群,n9e 依赖 2 个存储,数据库、Redis,数据库可以使用 MySQL 或 Postgres,自己按需选用。
|
||||
|
||||
n9e 提供的是 HTTP 接口,前面负载均衡可以是 4 层的,也可以是 7 层的。一般就选用 Nginx 就可以了。
|
||||
|
||||
n9e 这个模块接收到数据之后,需要转发给后端的时序库,相关配置是:
|
||||
|
||||
```toml
|
||||
[Pushgw]
|
||||
LabelRewrite = true
|
||||
[[Pushgw.Writers]]
|
||||
Url = "http://127.0.0.1:9090/api/v1/write"
|
||||
```
|
||||
|
||||
> 注意:虽然数据源可以在页面配置了,但是上报转发链路,还是需要在配置文件指定。
|
||||
|
||||
所有机房的 agent( 比如 Categraf、Telegraf、 Grafana-agent、Datadog-agent ),都直接推数据给 n9e,这个架构最为简单,维护成本最低。当然,前提是要求机房之间网络链路比较好,一般有专线。如果网络链路不好,则要使用下面的部署方式了。
|
||||
|
||||
### 边缘下沉式混杂部署方案
|
||||
|
||||

|
||||
|
||||
这个图尝试解释 3 种不同的情形,比如 A 机房和中心网络链路很好,Categraf 可以直接汇报数据给中心 n9e 模块,另一个机房网络链路不好,就需要把时序库下沉部署,时序库下沉了,对应的告警引擎和转发网关也都要跟随下沉,这样数据不会跨机房传输,比较稳定。但是心跳还是需要往中心心跳,要不然在对象列表里看不到机器的 CPU、内存使用率。还有的时候,可能是接入的一个已有的 Prometheus,数据采集没有走 Categraf,那此时只需要把 Prometheus 作为数据源接入夜莺即可,可以在夜莺里看图、配告警规则,但是就是在对象列表里看不到,也不能使用告警自愈的功能,问题也不大,核心功能都不受影响。
|
||||
|
||||
边缘机房,下沉部署时序库、告警引擎、转发网关的时候,要注意,告警引擎需要依赖数据库,因为要同步告警规则,转发网关也要依赖数据库,因为要注册对象到数据库里去,需要打通相关网络,告警引擎和转发网关都不用Redis,所以无需为 Redis 打通网络。
|
||||
|
||||
### VictoriaMetrics 集群架构
|
||||
<img src="doc/img/install-vm.png" width="600">
|
||||
|
||||
如果单机版本的时序数据库(比如 Prometheus) 性能有瓶颈或容灾较差,我们推荐使用 [VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics),VictoriaMetrics 架构较为简单,性能优异,易于部署和运维,架构图如上。VictoriaMetrics 更详尽的文档,还请参考其[官网](https://victoriametrics.com/)。
|
||||
|
||||
## 夜莺社区
|
||||
|
||||
开源项目要更有生命力,离不开开放的治理架构和源源不断的开发者和用户共同参与,我们致力于建立开放、中立的开源治理架构,吸纳更多来自企业、高校等各方面对云原生监控感兴趣、有热情的开发者,一起打造有活力的夜莺开源社区。关于《夜莺开源项目和社区治理架构(草案)》,请查阅 [COMMUNITY GOVERNANCE](./doc/community-governance.md).
|
||||
|
||||
**我们欢迎您以各种方式参与到夜莺开源项目和开源社区中来,工作包括不限于**:
|
||||
- 补充和完善文档 => [n9e.github.io](https://n9e.github.io/)
|
||||
- 分享您在使用夜莺监控过程中的最佳实践和经验心得 => [文章分享](https://flashcat.cloud/docs/content/flashcat-monitor/nightingale/share/)
|
||||
- 提交产品建议 =》 [github issue](https://github.com/ccfos/nightingale/issues/new?assignees=&labels=kind%2Ffeature&template=enhancement.md)
|
||||
- 提交代码,让夜莺监控更快、更稳、更好用 => [github pull request](https://github.com/didi/nightingale/pulls)
|
||||
|
||||
**尊重、认可和记录每一位贡献者的工作**是夜莺开源社区的第一指导原则,我们提倡**高效的提问**,这既是对开发者时间的尊重,也是对整个社区知识沉淀的贡献:
|
||||
- 提问之前请先查阅 [FAQ](https://www.gitlink.org.cn/ccfos/nightingale/wiki/faq)
|
||||
- 我们使用[论坛](https://answer.flashcat.cloud/)进行交流,有问题可以到这里搜索、提问
|
||||
|
||||
|
||||
## Who is using Nightingale
|
||||
|
||||
您可以通过在 **[Who is Using Nightingale](https://github.com/ccfos/nightingale/issues/897)** 登记您的使用情况,分享您的使用经验。
|
||||
|
||||
## Stargazers over time
|
||||
[](https://starchart.cc/ccfos/nightingale)
|
||||
|
||||
## Contributors
|
||||
<a href="https://github.com/ccfos/nightingale/graphs/contributors">
|
||||
<img src="https://contrib.rocks/image?repo=ccfos/nightingale" />
|
||||
</a>
|
||||
|
||||
## License
|
||||
[Apache License V2.0](https://github.com/didi/nightingale/blob/main/LICENSE)
|
||||
|
||||
## 加入交流群
|
||||
|
||||
<img src="doc/img/wecom.png" width="120">
|
||||
@@ -4,8 +4,7 @@ FROM python:3-slim
|
||||
WORKDIR /app
|
||||
ADD n9e /app
|
||||
ADD http://download.flashcat.cloud/wait /wait
|
||||
RUN mkdir -p /app/pub && chmod +x /wait
|
||||
ADD pub /app/pub/
|
||||
RUN chmod +x /wait
|
||||
RUN chmod +x n9e
|
||||
|
||||
EXPOSE 17000
|
||||
|
||||
@@ -1,15 +1,12 @@
|
||||
FROM --platform=$BUILDPLATFORM python:3-slim
|
||||
FROM --platform=$TARGETPLATFORM python:3-slim
|
||||
|
||||
|
||||
WORKDIR /app
|
||||
ADD n9e /app
|
||||
ADD etc /app
|
||||
RUN mkdir -p /app/integrations
|
||||
ADD n9e /app/
|
||||
ADD etc /app/
|
||||
ADD integrations /app/integrations/
|
||||
ADD http://download.flashcat.cloud/wait /wait
|
||||
RUN mkdir -p /app/pub && chmod +x /wait
|
||||
ADD pub /app/pub/
|
||||
RUN chmod +x n9e
|
||||
ADD --chmod=755 https://github.com/ufoscout/docker-compose-wait/releases/download/2.11.0/wait_x86_64 /wait
|
||||
RUN chmod +x /wait
|
||||
|
||||
EXPOSE 17000
|
||||
|
||||
|
||||
13
docker/Dockerfile.goreleaser.arm64
Normal file
@@ -0,0 +1,13 @@
|
||||
FROM flashcatcloud/toolbox:v0.0.1 as toolbox
|
||||
FROM --platform=$TARGETPLATFORM python:3-slim
|
||||
|
||||
|
||||
WORKDIR /app
|
||||
ADD n9e /app/
|
||||
ADD etc /app/
|
||||
ADD integrations /app/integrations/
|
||||
COPY --chmod=755 --from=toolbox /toolbox/wait_aarch64 /wait
|
||||
|
||||
EXPOSE 17000
|
||||
|
||||
CMD ["/app/n9e", "-h"]
|
||||
@@ -10,7 +10,6 @@ echo "tag: ${tag}"
|
||||
|
||||
rm -rf n9e pub
|
||||
cp ../n9e .
|
||||
cp -r ../pub .
|
||||
|
||||
docker build -t nightingale:${tag} .
|
||||
|
||||
|
||||
@@ -251,7 +251,6 @@ COMMENT ON COLUMN chart.group_id IS 'chart group id';
|
||||
CREATE TABLE chart_share (
|
||||
id bigserial,
|
||||
cluster varchar(128) not null,
|
||||
dashboard_id bigint not null,
|
||||
datasource_id bigint not null default 0,
|
||||
configs text,
|
||||
create_at bigint not null default 0,
|
||||
@@ -651,6 +650,7 @@ COMMENT ON COLUMN task_tpl_host.host IS 'ip or hostname';
|
||||
CREATE TABLE task_record
|
||||
(
|
||||
id bigint not null ,
|
||||
event_id bigint not null default 0,
|
||||
group_id bigint not null ,
|
||||
ibex_address varchar(128) not null,
|
||||
ibex_auth_user varchar(128) not null default '',
|
||||
@@ -669,22 +669,23 @@ CREATE TABLE task_record
|
||||
) ;
|
||||
CREATE INDEX task_record_cg_idx ON task_record (create_at, group_id);
|
||||
CREATE INDEX task_record_create_by_idx ON task_record (create_by);
|
||||
CREATE INDEX task_record_event_id_idx ON task_record (event_id);
|
||||
COMMENT ON COLUMN task_record.id IS 'ibex task id';
|
||||
COMMENT ON COLUMN task_record.group_id IS 'busi group id';
|
||||
|
||||
COMMENT ON COLUMN task_record.event_id IS 'event id';
|
||||
|
||||
CREATE TABLE alerting_engines
|
||||
(
|
||||
id serial,
|
||||
instance varchar(128) not null default '' ,
|
||||
datasource_id bigint not null default 0 ,
|
||||
cluster varchar(128) not null default '' ,
|
||||
engine_cluster varchar(128) not null default '' ,
|
||||
clock bigint not null,
|
||||
PRIMARY KEY (id)
|
||||
) ;
|
||||
COMMENT ON COLUMN alerting_engines.instance IS 'instance identification, e.g. 10.9.0.9:9090';
|
||||
COMMENT ON COLUMN alerting_engines.datasource_id IS 'datasource id';
|
||||
COMMENT ON COLUMN alerting_engines.cluster IS 'target reader cluster';
|
||||
COMMENT ON COLUMN alerting_engines.engine_cluster IS 'target reader cluster';
|
||||
|
||||
|
||||
CREATE TABLE datasource
|
||||
|
||||
@@ -167,7 +167,7 @@ CREATE TABLE `busi_group_member` (
|
||||
KEY (`user_group_id`)
|
||||
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4;
|
||||
|
||||
insert into busi_group_member(busi_group_id, user_group_id, perm_flag) values(1, 1, "rw");
|
||||
insert into busi_group_member(busi_group_id, user_group_id, perm_flag) values(1, 1, 'rw');
|
||||
|
||||
-- for dashboard new version
|
||||
CREATE TABLE `board` (
|
||||
@@ -334,7 +334,7 @@ CREATE TABLE `alert_subscribe` (
|
||||
KEY (`update_at`),
|
||||
KEY (`group_id`)
|
||||
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4;
|
||||
|
||||
|
||||
CREATE TABLE `target` (
|
||||
`id` bigint unsigned not null auto_increment,
|
||||
`group_id` bigint not null default 0 comment 'busi group id',
|
||||
@@ -383,7 +383,7 @@ CREATE TABLE `metric_view` (
|
||||
) ENGINE=InnoDB DEFAULT CHARSET = utf8mb4;
|
||||
|
||||
insert into metric_view(name, cate, configs) values('Host View', 0, '{"filters":[{"oper":"=","label":"__name__","value":"cpu_usage_idle"}],"dynamicLabels":[],"dimensionLabels":[{"label":"ident","value":""}]}');
|
||||
|
||||
|
||||
CREATE TABLE `recording_rule` (
|
||||
`id` bigint unsigned not null auto_increment,
|
||||
`group_id` bigint not null default '0' comment 'group_id',
|
||||
@@ -531,6 +531,7 @@ CREATE TABLE `task_tpl_host`
|
||||
CREATE TABLE `task_record`
|
||||
(
|
||||
`id` bigint unsigned not null comment 'ibex task id',
|
||||
`event_id` bigint not null comment 'event id' default 0,
|
||||
`group_id` bigint not null comment 'busi group id',
|
||||
`ibex_address` varchar(128) not null,
|
||||
`ibex_auth_user` varchar(128) not null default '',
|
||||
@@ -547,7 +548,8 @@ CREATE TABLE `task_record`
|
||||
`create_by` varchar(64) not null default '',
|
||||
PRIMARY KEY (`id`),
|
||||
KEY (`create_at`, `group_id`),
|
||||
KEY (`create_by`)
|
||||
KEY (`create_by`),
|
||||
KEY (`event_id`)
|
||||
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4;
|
||||
|
||||
CREATE TABLE `alerting_engines`
|
||||
@@ -555,12 +557,11 @@ CREATE TABLE `alerting_engines`
|
||||
`id` int unsigned NOT NULL AUTO_INCREMENT,
|
||||
`instance` varchar(128) not null default '' comment 'instance identification, e.g. 10.9.0.9:9090',
|
||||
`datasource_id` bigint not null default 0 comment 'datasource id',
|
||||
`cluster` varchar(128) not null default '' comment 'n9e-alert cluster',
|
||||
`engine_cluster` varchar(128) not null default '' comment 'n9e-alert cluster',
|
||||
`clock` bigint not null,
|
||||
PRIMARY KEY (`id`)
|
||||
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4;
|
||||
|
||||
|
||||
CREATE TABLE `datasource`
|
||||
(
|
||||
`id` int unsigned NOT NULL AUTO_INCREMENT,
|
||||
@@ -581,15 +582,15 @@ CREATE TABLE `datasource`
|
||||
`updated_by` varchar(64) not null default '',
|
||||
UNIQUE KEY (`name`),
|
||||
PRIMARY KEY (`id`)
|
||||
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4;
|
||||
|
||||
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4;
|
||||
|
||||
CREATE TABLE `builtin_cate` (
|
||||
`id` bigint unsigned not null auto_increment,
|
||||
`name` varchar(191) not null,
|
||||
`user_id` bigint not null default 0,
|
||||
PRIMARY KEY (`id`)
|
||||
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4;
|
||||
|
||||
|
||||
CREATE TABLE `notify_tpl` (
|
||||
`id` bigint unsigned not null auto_increment,
|
||||
`channel` varchar(32) not null,
|
||||
|
||||
60
etc/alert.toml.example
Normal file
@@ -0,0 +1,60 @@
|
||||
[Global]
|
||||
RunMode = "release"
|
||||
|
||||
[CenterApi]
|
||||
Addrs = ["http://127.0.0.1:17000"]
|
||||
[CenterApi.BasicAuth]
|
||||
user001 = "ccc26da7b9aba533cbb263a36c07dcc5"
|
||||
|
||||
[Alert]
|
||||
[Alert.Heartbeat]
|
||||
# auto detect if blank
|
||||
IP = ""
|
||||
# unit ms
|
||||
Interval = 1000
|
||||
EngineName = "default02"
|
||||
|
||||
[Log]
|
||||
# log write dir
|
||||
Dir = "logs"
|
||||
# log level: DEBUG INFO WARNING ERROR
|
||||
Level = "DEBUG"
|
||||
# stdout, stderr, file
|
||||
Output = "stdout"
|
||||
# # rotate by time
|
||||
# KeepHours = 4
|
||||
# # rotate by size
|
||||
# RotateNum = 3
|
||||
# # unit: MB
|
||||
# RotateSize = 256
|
||||
|
||||
[HTTP]
|
||||
# http listening address
|
||||
Host = "0.0.0.0"
|
||||
# http listening port
|
||||
Port = 17001
|
||||
# https cert file path
|
||||
CertFile = ""
|
||||
# https key file path
|
||||
KeyFile = ""
|
||||
# whether print access log
|
||||
PrintAccessLog = false
|
||||
# whether enable pprof
|
||||
PProf = false
|
||||
# expose prometheus /metrics?
|
||||
ExposeMetrics = true
|
||||
# http graceful shutdown timeout, unit: s
|
||||
ShutdownTimeout = 30
|
||||
# max content length: 64M
|
||||
MaxContentLength = 67108864
|
||||
# http server read timeout, unit: s
|
||||
ReadTimeout = 20
|
||||
# http server write timeout, unit: s
|
||||
WriteTimeout = 40
|
||||
# http server idle timeout, unit: s
|
||||
IdleTimeout = 120
|
||||
|
||||
[HTTP.Alert]
|
||||
Enable = true
|
||||
[HTTP.Alert.BasicAuth]
|
||||
user001 = "ccc26da7b9aba533cbb263a36c07dcc5"
|
||||
@@ -9,7 +9,7 @@ Level = "DEBUG"
|
||||
# stdout, stderr, file
|
||||
Output = "stdout"
|
||||
# # rotate by time
|
||||
# KeepHours: 4
|
||||
# KeepHours = 4
|
||||
# # rotate by size
|
||||
# RotateNum = 3
|
||||
# # unit: MB
|
||||
@@ -41,22 +41,12 @@ WriteTimeout = 40
|
||||
# http server idle timeout, unit: s
|
||||
IdleTimeout = 120
|
||||
|
||||
[HTTP.Pushgw]
|
||||
[HTTP.APIForAgent]
|
||||
Enable = true
|
||||
# [HTTP.Pushgw.BasicAuth]
|
||||
# user001 = "ccc26da7b9aba533cbb263a36c07dcc5"
|
||||
|
||||
[HTTP.Alert]
|
||||
Enable = true
|
||||
[HTTP.Alert.BasicAuth]
|
||||
user001 = "ccc26da7b9aba533cbb263a36c07dcc5"
|
||||
|
||||
[HTTP.Heartbeat]
|
||||
Enable = true
|
||||
# [HTTP.Heartbeat.BasicAuth]
|
||||
# user001 = "ccc26da7b9aba533cbb263a36c07dcc5"
|
||||
|
||||
[HTTP.Service]
|
||||
[HTTP.APIForService]
|
||||
Enable = true
|
||||
[HTTP.Service.BasicAuth]
|
||||
user001 = "ccc26da7b9aba533cbb263a36c07dcc5"
|
||||
@@ -178,4 +168,4 @@ MaxIdleConnsPerHost = 100
|
||||
# SourceLabels = ["__address__"]
|
||||
# Regex = "([^:]+)(?::\\d+)?"
|
||||
# Replacement = "$1:80"
|
||||
# TargetLabel = "__address__"
|
||||
# TargetLabel = "__address__"
|
||||
|
||||
103
etc/pushgw.toml.example
Normal file
@@ -0,0 +1,103 @@
|
||||
[Global]
|
||||
RunMode = "release"
|
||||
|
||||
[CenterApi]
|
||||
Addrs = ["http://127.0.0.1:17000"]
|
||||
[CenterApi.BasicAuth]
|
||||
user001 = "ccc26da7b9aba533cbb263a36c07dcc5"
|
||||
|
||||
[Pushgw]
|
||||
# use target labels in database instead of in series
|
||||
LabelRewrite = true
|
||||
# # default busigroup key name
|
||||
# BusiGroupLabelKey = "busigroup"
|
||||
# ForceUseServerTS = false
|
||||
|
||||
# [Pushgw.DebugSample]
|
||||
# ident = "xx"
|
||||
# __name__ = "xx"
|
||||
|
||||
# [Pushgw.WriterOpt]
|
||||
# # Writer Options
|
||||
# QueueCount = 1000
|
||||
# QueueMaxSize = 1000000
|
||||
# QueuePopSize = 1000
|
||||
# # ident or metric
|
||||
# ShardingKey = "ident"
|
||||
|
||||
[[Pushgw.Writers]]
|
||||
# Url = "http://127.0.0.1:8480/insert/0/prometheus/api/v1/write"
|
||||
Url = "http://127.0.0.1:9090/api/v1/write"
|
||||
# Basic auth username
|
||||
BasicAuthUser = ""
|
||||
# Basic auth password
|
||||
BasicAuthPass = ""
|
||||
# timeout settings, unit: ms
|
||||
Headers = ["X-From", "n9e"]
|
||||
Timeout = 10000
|
||||
DialTimeout = 3000
|
||||
TLSHandshakeTimeout = 30000
|
||||
ExpectContinueTimeout = 1000
|
||||
IdleConnTimeout = 90000
|
||||
# time duration, unit: ms
|
||||
KeepAlive = 30000
|
||||
MaxConnsPerHost = 0
|
||||
MaxIdleConns = 100
|
||||
MaxIdleConnsPerHost = 100
|
||||
## Optional TLS Config
|
||||
# UseTLS = false
|
||||
# TLSCA = "/etc/n9e/ca.pem"
|
||||
# TLSCert = "/etc/n9e/cert.pem"
|
||||
# TLSKey = "/etc/n9e/key.pem"
|
||||
# InsecureSkipVerify = false
|
||||
# [[Writers.WriteRelabels]]
|
||||
# Action = "replace"
|
||||
# SourceLabels = ["__address__"]
|
||||
# Regex = "([^:]+)(?::\\d+)?"
|
||||
# Replacement = "$1:80"
|
||||
# TargetLabel = "__address__"
|
||||
|
||||
[Log]
|
||||
# log write dir
|
||||
Dir = "logs"
|
||||
# log level: DEBUG INFO WARNING ERROR
|
||||
Level = "DEBUG"
|
||||
# stdout, stderr, file
|
||||
Output = "stdout"
|
||||
# # rotate by time
|
||||
# KeepHours = 4
|
||||
# # rotate by size
|
||||
# RotateNum = 3
|
||||
# # unit: MB
|
||||
# RotateSize = 256
|
||||
|
||||
[HTTP]
|
||||
# http listening address
|
||||
Host = "0.0.0.0"
|
||||
# http listening port
|
||||
Port = 17000
|
||||
# https cert file path
|
||||
CertFile = ""
|
||||
# https key file path
|
||||
KeyFile = ""
|
||||
# whether print access log
|
||||
PrintAccessLog = false
|
||||
# whether enable pprof
|
||||
PProf = false
|
||||
# expose prometheus /metrics?
|
||||
ExposeMetrics = true
|
||||
# http graceful shutdown timeout, unit: s
|
||||
ShutdownTimeout = 30
|
||||
# max content length: 64M
|
||||
MaxContentLength = 67108864
|
||||
# http server read timeout, unit: s
|
||||
ReadTimeout = 20
|
||||
# http server write timeout, unit: s
|
||||
WriteTimeout = 40
|
||||
# http server idle timeout, unit: s
|
||||
IdleTimeout = 120
|
||||
|
||||
[HTTP.Pushgw]
|
||||
Enable = true
|
||||
# [HTTP.Pushgw.BasicAuth]
|
||||
# user001 = "ccc26da7b9aba533cbb263a36c07dcc5"
|
||||
9
fe.sh
@@ -7,4 +7,11 @@ curl -o n9e-fe-${VERSION}.tar.gz -L https://github.com/n9e/fe/releases/download/
|
||||
|
||||
tar zxvf n9e-fe-${VERSION}.tar.gz
|
||||
|
||||
cp ./docker/initsql/a-n9e.sql n9e.sql
|
||||
cp ./docker/initsql/a-n9e.sql n9e.sql
|
||||
|
||||
# Embed files into a Go executable
|
||||
statik -src=./pub -dest=./front
|
||||
|
||||
# rm the fe file
|
||||
rm n9e-fe-${VERSION}.tar.gz
|
||||
rm -r ./pub
|
||||
14
front/statik/statik.go
Normal file
@@ -0,0 +1,14 @@
|
||||
// Code generated by statik. DO NOT EDIT.
|
||||
|
||||
package statik
|
||||
|
||||
import (
|
||||
"github.com/rakyll/statik/fs"
|
||||
)
|
||||
|
||||
|
||||
func init() {
|
||||
data := "PK\x05\x06\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
|
||||
fs.Register(data)
|
||||
}
|
||||
|
||||
23
go.mod
@@ -7,7 +7,7 @@ require (
|
||||
github.com/coreos/go-oidc v2.2.1+incompatible
|
||||
github.com/dgrijalva/jwt-go v3.2.0+incompatible
|
||||
github.com/gin-contrib/pprof v1.4.0
|
||||
github.com/gin-gonic/gin v1.8.2
|
||||
github.com/gin-gonic/gin v1.9.0
|
||||
github.com/go-ldap/ldap/v3 v3.4.4
|
||||
github.com/gogo/protobuf v1.3.2
|
||||
github.com/golang-jwt/jwt v3.2.2+incompatible
|
||||
@@ -17,12 +17,13 @@ require (
|
||||
github.com/json-iterator/go v1.1.12
|
||||
github.com/koding/multiconfig v0.0.0-20171124222453-69c27309b2d7
|
||||
github.com/mailru/easyjson v0.7.7
|
||||
github.com/mattn/go-isatty v0.0.16
|
||||
github.com/mattn/go-isatty v0.0.17
|
||||
github.com/pelletier/go-toml/v2 v2.0.6
|
||||
github.com/pkg/errors v0.9.1
|
||||
github.com/prometheus/client_golang v1.14.0
|
||||
github.com/prometheus/common v0.39.0
|
||||
github.com/prometheus/prometheus v2.5.0+incompatible
|
||||
github.com/rakyll/statik v0.1.7
|
||||
github.com/redis/go-redis/v9 v9.0.2
|
||||
github.com/tidwall/gjson v1.14.0
|
||||
github.com/toolkits/pkg v1.3.3
|
||||
@@ -36,17 +37,19 @@ require (
|
||||
require (
|
||||
github.com/Azure/go-ntlmssp v0.0.0-20220621081337-cb9428e4ac1e // indirect
|
||||
github.com/beorn7/perks v1.0.1 // indirect
|
||||
github.com/bytedance/sonic v1.8.0 // indirect
|
||||
github.com/cespare/xxhash/v2 v2.2.0 // indirect
|
||||
github.com/chenzhuoyu/base64x v0.0.0-20221115062448-fe3a3abad311 // indirect
|
||||
github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f // indirect
|
||||
github.com/fatih/camelcase v1.0.0 // indirect
|
||||
github.com/fatih/structs v1.1.0 // indirect
|
||||
github.com/gin-contrib/sse v0.1.0 // indirect
|
||||
github.com/go-asn1-ber/asn1-ber v1.5.4 // indirect
|
||||
github.com/go-playground/locales v0.14.0 // indirect
|
||||
github.com/go-playground/universal-translator v0.18.0 // indirect
|
||||
github.com/go-playground/validator/v10 v10.11.1 // indirect
|
||||
github.com/go-playground/locales v0.14.1 // indirect
|
||||
github.com/go-playground/universal-translator v0.18.1 // indirect
|
||||
github.com/go-playground/validator/v10 v10.11.2 // indirect
|
||||
github.com/go-sql-driver/mysql v1.6.0 // indirect
|
||||
github.com/goccy/go-json v0.9.11 // indirect
|
||||
github.com/goccy/go-json v0.10.0 // indirect
|
||||
github.com/grpc-ecosystem/grpc-gateway v1.16.0 // indirect
|
||||
github.com/jackc/chunkreader/v2 v2.0.1 // indirect
|
||||
github.com/jackc/pgconn v1.13.0 // indirect
|
||||
@@ -59,6 +62,7 @@ require (
|
||||
github.com/jinzhu/inflection v1.0.0 // indirect
|
||||
github.com/jinzhu/now v1.1.5 // indirect
|
||||
github.com/josharian/intern v1.0.0 // indirect
|
||||
github.com/klauspost/cpuid/v2 v2.0.9 // indirect
|
||||
github.com/leodido/go-urn v1.2.1 // indirect
|
||||
github.com/matttproud/golang_protobuf_extensions v1.0.4 // indirect
|
||||
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
|
||||
@@ -69,9 +73,11 @@ require (
|
||||
github.com/spaolacci/murmur3 v1.1.0 // indirect
|
||||
github.com/tidwall/match v1.1.1 // indirect
|
||||
github.com/tidwall/pretty v1.2.0 // indirect
|
||||
github.com/ugorji/go/codec v1.2.7 // indirect
|
||||
github.com/twitchyliquid64/golang-asm v0.15.1 // indirect
|
||||
github.com/ugorji/go/codec v1.2.9 // indirect
|
||||
go.uber.org/automaxprocs v1.4.0 // indirect
|
||||
golang.org/x/crypto v0.1.0 // indirect
|
||||
golang.org/x/arch v0.0.0-20210923205945-b76863e36670 // indirect
|
||||
golang.org/x/crypto v0.5.0 // indirect
|
||||
golang.org/x/net v0.7.0 // indirect
|
||||
golang.org/x/sys v0.5.0 // indirect
|
||||
golang.org/x/text v0.7.0 // indirect
|
||||
@@ -82,4 +88,5 @@ require (
|
||||
gopkg.in/alexcesaro/quotedprintable.v3 v3.0.0-20150716171945-2caba252f4dc // indirect
|
||||
gopkg.in/square/go-jose.v2 v2.6.0 // indirect
|
||||
gopkg.in/yaml.v2 v2.4.0 // indirect
|
||||
gopkg.in/yaml.v3 v3.0.1 // indirect
|
||||
)
|
||||
|
||||
47
go.sum
@@ -11,9 +11,15 @@ github.com/beorn7/perks v1.0.1 h1:VlbKKnNfV8bJzeqoa4cOKqO6bYr3WgKZxO8Z16+hsOM=
|
||||
github.com/beorn7/perks v1.0.1/go.mod h1:G2ZrVWU2WbWT9wwq4/hrbKbnv/1ERSJQ0ibhJ6rlkpw=
|
||||
github.com/bsm/ginkgo/v2 v2.5.0 h1:aOAnND1T40wEdAtkGSkvSICWeQ8L3UASX7YVCqQx+eQ=
|
||||
github.com/bsm/gomega v1.20.0 h1:JhAwLmtRzXFTx2AkALSLa8ijZafntmhSoU63Ok18Uq8=
|
||||
github.com/bytedance/sonic v1.5.0/go.mod h1:ED5hyg4y6t3/9Ku1R6dU/4KyJ48DZ4jPhfY1O2AihPM=
|
||||
github.com/bytedance/sonic v1.8.0 h1:ea0Xadu+sHlu7x5O3gKhRpQ1IKiMrSiHttPF0ybECuA=
|
||||
github.com/bytedance/sonic v1.8.0/go.mod h1:i736AoUSYt75HyZLoJW9ERYxcy6eaN6h4BZXU064P/U=
|
||||
github.com/census-instrumentation/opencensus-proto v0.2.1/go.mod h1:f6KPmirojxKA12rnyqOA5BBL4O983OfeGPqjHWSTneU=
|
||||
github.com/cespare/xxhash/v2 v2.2.0 h1:DC2CZ1Ep5Y4k3ZQ899DldepgrayRUGE6BBZ/cd9Cj44=
|
||||
github.com/cespare/xxhash/v2 v2.2.0/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs=
|
||||
github.com/chenzhuoyu/base64x v0.0.0-20211019084208-fb5309c8db06/go.mod h1:DH46F32mSOjUmXrMHnKwZdA8wcEefY7UVqBKYGjpdQY=
|
||||
github.com/chenzhuoyu/base64x v0.0.0-20221115062448-fe3a3abad311 h1:qSGYFH7+jGhDF8vLC+iwCD4WpbV1EBDSzWkJODFLams=
|
||||
github.com/chenzhuoyu/base64x v0.0.0-20221115062448-fe3a3abad311/go.mod h1:b583jCggY9gE99b6G5LEC39OIiVsWj+R97kbl5odCEk=
|
||||
github.com/client9/misspell v0.3.4/go.mod h1:qj6jICC3Q7zFZvVWo7KLAzC3yx5G7kyvSDkc90ppPyw=
|
||||
github.com/cncf/udpa/go v0.0.0-20191209042840-269d4d468f6f/go.mod h1:M8M6+tZqaGXZJjfX53e64911xZQV5JYwmTeXPW+k8Sc=
|
||||
github.com/cockroachdb/apd v1.1.0 h1:3LFP3629v+1aKXU5Q37mxmRxX/pIu1nijXydLShEq5I=
|
||||
@@ -47,32 +53,34 @@ github.com/gin-contrib/sse v0.1.0 h1:Y/yl/+YNO8GZSjAhjMsSuLt29uWRFHdHYUb5lYOV9qE
|
||||
github.com/gin-contrib/sse v0.1.0/go.mod h1:RHrZQHXnP2xjPF+u1gW/2HnVO7nvIa9PG3Gm+fLHvGI=
|
||||
github.com/gin-gonic/gin v1.7.7/go.mod h1:axIBovoeJpVj8S3BwE0uPMTeReE4+AfFtqpqaZ1qq1U=
|
||||
github.com/gin-gonic/gin v1.8.1/go.mod h1:ji8BvRH1azfM+SYow9zQ6SZMvR8qOMZHmsCuWR9tTTk=
|
||||
github.com/gin-gonic/gin v1.8.2 h1:UzKToD9/PoFj/V4rvlKqTRKnQYyz8Sc1MJlv4JHPtvY=
|
||||
github.com/gin-gonic/gin v1.8.2/go.mod h1:qw5AYuDrzRTnhvusDsrov+fDIxp9Dleuu12h8nfB398=
|
||||
github.com/gin-gonic/gin v1.9.0 h1:OjyFBKICoexlu99ctXNR2gg+c5pKrKMuyjgARg9qeY8=
|
||||
github.com/gin-gonic/gin v1.9.0/go.mod h1:W1Me9+hsUSyj3CePGrd1/QrKJMSJ1Tu/0hFEH89961k=
|
||||
github.com/go-asn1-ber/asn1-ber v1.5.4 h1:vXT6d/FNDiELJnLb6hGNa309LMsrCoYFvpwHDF0+Y1A=
|
||||
github.com/go-asn1-ber/asn1-ber v1.5.4/go.mod h1:hEBeB/ic+5LoWskz+yKT7vGhhPYkProFKoKdwZRWMe0=
|
||||
github.com/go-kit/log v0.1.0/go.mod h1:zbhenjAZHb184qTLMA9ZjW7ThYL0H2mk7Q6pNt4vbaY=
|
||||
github.com/go-ldap/ldap/v3 v3.4.4 h1:qPjipEpt+qDa6SI/h1fzuGWoRUY+qqQ9sOZq67/PYUs=
|
||||
github.com/go-ldap/ldap/v3 v3.4.4/go.mod h1:fe1MsuN5eJJ1FeLT/LEBVdWfNWKh459R7aXgXtJC+aI=
|
||||
github.com/go-logfmt/logfmt v0.5.0/go.mod h1:wCYkCAKZfumFQihp8CzCvQ3paCTfi41vtzG1KdI/P7A=
|
||||
github.com/go-playground/assert/v2 v2.0.1 h1:MsBgLAaY856+nPRTKrp3/OZK38U/wa0CcBYNjji3q3A=
|
||||
github.com/go-playground/assert/v2 v2.0.1/go.mod h1:VDjEfimB/XKnb+ZQfWdccd7VUvScMdVu0Titje2rxJ4=
|
||||
github.com/go-playground/assert/v2 v2.2.0 h1:JvknZsQTYeFEAhQwI4qEt9cyV5ONwRHC+lYKSsYSR8s=
|
||||
github.com/go-playground/locales v0.13.0/go.mod h1:taPMhCMXrRLJO55olJkUXHZBHCxTMfnGwq/HNwmWNS8=
|
||||
github.com/go-playground/locales v0.14.0 h1:u50s323jtVGugKlcYeyzC0etD1HifMjqmJqb8WugfUU=
|
||||
github.com/go-playground/locales v0.14.0/go.mod h1:sawfccIbzZTqEDETgFXqTho0QybSa7l++s0DH+LDiLs=
|
||||
github.com/go-playground/locales v0.14.1 h1:EWaQ/wswjilfKLTECiXz7Rh+3BjFhfDFKv/oXslEjJA=
|
||||
github.com/go-playground/locales v0.14.1/go.mod h1:hxrqLVvrK65+Rwrd5Fc6F2O76J/NuW9t0sjnWqG1slY=
|
||||
github.com/go-playground/universal-translator v0.17.0/go.mod h1:UkSxE5sNxxRwHyU+Scu5vgOQjsIJAF8j9muTVoKLVtA=
|
||||
github.com/go-playground/universal-translator v0.18.0 h1:82dyy6p4OuJq4/CByFNOn/jYrnRPArHwAcmLoJZxyho=
|
||||
github.com/go-playground/universal-translator v0.18.0/go.mod h1:UvRDBj+xPUEGrFYl+lu/H90nyDXpg0fqeB/AQUGNTVA=
|
||||
github.com/go-playground/universal-translator v0.18.1 h1:Bcnm0ZwsGyWbCzImXv+pAJnYK9S473LQFuzCbDbfSFY=
|
||||
github.com/go-playground/universal-translator v0.18.1/go.mod h1:xekY+UJKNuX9WP91TpwSH2VMlDf28Uj24BCp08ZFTUY=
|
||||
github.com/go-playground/validator/v10 v10.4.1/go.mod h1:nlOn6nFhuKACm19sB/8EGNn9GlaMV7XkbRSipzJ0Ii4=
|
||||
github.com/go-playground/validator/v10 v10.10.0/go.mod h1:74x4gJWsvQexRdW8Pn3dXSGrTK4nAUsbPlLADvpJkos=
|
||||
github.com/go-playground/validator/v10 v10.11.1 h1:prmOlTVv+YjZjmRmNSF3VmspqJIxJWXmqUsHwfTRRkQ=
|
||||
github.com/go-playground/validator/v10 v10.11.1/go.mod h1:i+3WkQ1FvaUjjxh1kSvIA4dMGDBiPU55YFDl0WbKdWU=
|
||||
github.com/go-playground/validator/v10 v10.11.2 h1:q3SHpufmypg+erIExEKUmsgmhDTyhcJ38oeKGACXohU=
|
||||
github.com/go-playground/validator/v10 v10.11.2/go.mod h1:NieE624vt4SCTJtD87arVLvdmjPAeV8BQlHtMnw9D7s=
|
||||
github.com/go-sql-driver/mysql v1.6.0 h1:BCTh4TKNUYmOmMUcQ3IipzF5prigylS7XXjEkfCHuOE=
|
||||
github.com/go-sql-driver/mysql v1.6.0/go.mod h1:DCzpHaOWr8IXmIStZouvnhqoel9Qv2LBy8hT2VhHyBg=
|
||||
github.com/go-stack/stack v1.8.0/go.mod h1:v0f6uXyyMGvRgIKkXu+yp6POWl0qKG85gN/melR3HDY=
|
||||
github.com/goccy/go-json v0.9.7/go.mod h1:6MelG93GURQebXPDq3khkgXZkazVtN9CRI+MGFi0w8I=
|
||||
github.com/goccy/go-json v0.9.11 h1:/pAaQDLHEoCq/5FFmSKBswWmK6H0e8g4159Kc/X/nqk=
|
||||
github.com/goccy/go-json v0.9.11/go.mod h1:6MelG93GURQebXPDq3khkgXZkazVtN9CRI+MGFi0w8I=
|
||||
github.com/goccy/go-json v0.10.0 h1:mXKd9Qw4NuzShiRlOXKews24ufknHO7gx30lsDyokKA=
|
||||
github.com/goccy/go-json v0.10.0/go.mod h1:6MelG93GURQebXPDq3khkgXZkazVtN9CRI+MGFi0w8I=
|
||||
github.com/gofrs/uuid v4.0.0+incompatible h1:1SD/1F5pU8p29ybwgQSwpQk+mwdRrXCYuPhW6m+TnJw=
|
||||
github.com/gofrs/uuid v4.0.0+incompatible/go.mod h1:b2aQJv3Z4Fp6yNu3cdSllBxTCLRxnplIgP/c0N/04lM=
|
||||
github.com/gogo/protobuf v1.3.2 h1:Ov1cvc58UF3b5XjBnZv7+opcTcQFZebYjWzi34vdm4Q=
|
||||
@@ -161,6 +169,8 @@ github.com/json-iterator/go v1.1.12 h1:PV8peI4a0ysnczrg+LtxykD8LfKY9ML6u2jnxaEnr
|
||||
github.com/json-iterator/go v1.1.12/go.mod h1:e30LSqwooZae/UwlEbR2852Gd8hjQvJoHmT4TnhNGBo=
|
||||
github.com/kisielk/errcheck v1.5.0/go.mod h1:pFxgyoBC7bSaBwPgfKdkLd5X25qrDl4LWUI2bnpBCr8=
|
||||
github.com/kisielk/gotool v1.0.0/go.mod h1:XhKaO+MFFWcvkIS/tQcRk01m1F5IRFswLeQ+oQHNcck=
|
||||
github.com/klauspost/cpuid/v2 v2.0.9 h1:lgaqFMSdTdQYdZ04uHyN2d/eKdOMyi2YLSvlQIBFYa4=
|
||||
github.com/klauspost/cpuid/v2 v2.0.9/go.mod h1:FInQzS24/EEf25PyTYn52gqo7WaD8xa0213Md/qVLRg=
|
||||
github.com/koding/multiconfig v0.0.0-20171124222453-69c27309b2d7 h1:SWlt7BoQNASbhTUD0Oy5yysI2seJ7vWuGUp///OM4TM=
|
||||
github.com/koding/multiconfig v0.0.0-20171124222453-69c27309b2d7/go.mod h1:Y2SaZf2Rzd0pXkLVhLlCiAXFCLSXAIbTKDivVgff/AM=
|
||||
github.com/konsorten/go-windows-terminal-sequences v1.0.1/go.mod h1:T0+1ngSBFLxvqU3pZ+m/2kptfBszLMUkC4ZK/EgS/cQ=
|
||||
@@ -190,8 +200,8 @@ github.com/mattn/go-isatty v0.0.5/go.mod h1:Iq45c/XA43vh69/j3iqttzPXn0bhXyGjM0Hd
|
||||
github.com/mattn/go-isatty v0.0.7/go.mod h1:Iq45c/XA43vh69/j3iqttzPXn0bhXyGjM0Hdxcsrc5s=
|
||||
github.com/mattn/go-isatty v0.0.12/go.mod h1:cbi8OIDigv2wuxKPP5vlRcQ1OAZbq2CE4Kysco4FUpU=
|
||||
github.com/mattn/go-isatty v0.0.14/go.mod h1:7GGIvUiUoEMVVmxf/4nioHXj79iQHKdU27kJ6hsGG94=
|
||||
github.com/mattn/go-isatty v0.0.16 h1:bq3VjFmv/sOjHtdEhmkEV4x1AJtvUvOJ2PFAZ5+peKQ=
|
||||
github.com/mattn/go-isatty v0.0.16/go.mod h1:kYGgaQfpe5nmfYZH+SKPsOc2e4SrIfOl2e/yFXSvRLM=
|
||||
github.com/mattn/go-isatty v0.0.17 h1:BTarxUcIeDqL27Mc+vyvdWYSL28zpIhv3RoTdsLMPng=
|
||||
github.com/mattn/go-isatty v0.0.17/go.mod h1:kYGgaQfpe5nmfYZH+SKPsOc2e4SrIfOl2e/yFXSvRLM=
|
||||
github.com/matttproud/golang_protobuf_extensions v1.0.4 h1:mmDVorXM7PCGKw94cs5zkfA9PSy5pEvNWRP0ET0TIVo=
|
||||
github.com/matttproud/golang_protobuf_extensions v1.0.4/go.mod h1:BSXmuO+STAnVfrANrmjBb36TMTDstsz7MSK+HVaYKv4=
|
||||
github.com/modern-go/concurrent v0.0.0-20180228061459-e0a39a4cb421/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
|
||||
@@ -222,6 +232,8 @@ github.com/prometheus/procfs v0.8.0 h1:ODq8ZFEaYeCaZOJlZZdJA2AbQR98dSHSM1KW/You5
|
||||
github.com/prometheus/procfs v0.8.0/go.mod h1:z7EfXMXOkbkqb9IINtpCn86r/to3BnA0uaxHdg830/4=
|
||||
github.com/prometheus/prometheus v2.5.0+incompatible h1:7QPitgO2kOFG8ecuRn9O/4L9+10He72rVRJvMXrE9Hg=
|
||||
github.com/prometheus/prometheus v2.5.0+incompatible/go.mod h1:oAIUtOny2rjMX0OWN5vPR5/q/twIROJvdqnQKDdil/s=
|
||||
github.com/rakyll/statik v0.1.7 h1:OF3QCZUuyPxuGEP7B4ypUa7sB/iHtqOTDYZXGM8KOdQ=
|
||||
github.com/rakyll/statik v0.1.7/go.mod h1:AlZONWzMtEnMs7W4e/1LURLiI49pIMmp6V9Unghqrcc=
|
||||
github.com/redis/go-redis/v9 v9.0.2 h1:BA426Zqe/7r56kCcvxYLWe1mkaz71LKF77GwgFzSxfE=
|
||||
github.com/redis/go-redis/v9 v9.0.2/go.mod h1:/xDTe9EF1LM61hek62Poq2nzQSGj0xSrEtEHbBQevps=
|
||||
github.com/robfig/go-cache v0.0.0-20130306151617-9fc39e0dbf62/go.mod h1:65XQgovT59RWatovFwnwocoUxiI/eENTnOY5GK3STuY=
|
||||
@@ -265,11 +277,14 @@ github.com/tidwall/pretty v1.2.0 h1:RWIZEg2iJ8/g6fDDYzMpobmaoGh5OLl4AXtGUGPcqCs=
|
||||
github.com/tidwall/pretty v1.2.0/go.mod h1:ITEVvHYasfjBbM0u2Pg8T2nJnzm8xPwvNhhsoaGGjNU=
|
||||
github.com/toolkits/pkg v1.3.3 h1:qpQAQ18Jr47dv4NcBALlH0ad7L2PuqSh5K+nJKNg5lU=
|
||||
github.com/toolkits/pkg v1.3.3/go.mod h1:USXArTJlz1f1DCnQHNPYugO8GPkr1NRhP4eYQZQVshk=
|
||||
github.com/twitchyliquid64/golang-asm v0.15.1 h1:SU5vSMR7hnwNxj24w34ZyCi/FmDZTkS4MhqMhdFk5YI=
|
||||
github.com/twitchyliquid64/golang-asm v0.15.1/go.mod h1:a1lVb/DtPvCB8fslRZhAngC2+aY1QWCk3Cedj/Gdt08=
|
||||
github.com/ugorji/go v1.1.7/go.mod h1:kZn38zHttfInRq0xu/PH0az30d+z6vm202qpg1oXVMw=
|
||||
github.com/ugorji/go v1.2.7/go.mod h1:nF9osbDWLy6bDVv/Rtoh6QgnvNDpmCalQV5urGCCS6M=
|
||||
github.com/ugorji/go/codec v1.1.7/go.mod h1:Ax+UKWsSmolVDwsd+7N3ZtXu+yMGCf907BLYF3GoBXY=
|
||||
github.com/ugorji/go/codec v1.2.7 h1:YPXUKf7fYbp/y8xloBqZOw2qaVggbfwMlI8WM3wZUJ0=
|
||||
github.com/ugorji/go/codec v1.2.7/go.mod h1:WGN1fab3R1fzQlVQTkfxVtIBhWDRqOviHU95kRgeqEY=
|
||||
github.com/ugorji/go/codec v1.2.9 h1:rmenucSohSTiyL09Y+l2OCk+FrMxGMzho2+tjr5ticU=
|
||||
github.com/ugorji/go/codec v1.2.9/go.mod h1:UNopzCgEMSXjBc6AOMqYvWC1ktqTAfzJZUZgYf6w6lg=
|
||||
github.com/yuin/goldmark v1.1.27/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74=
|
||||
github.com/yuin/goldmark v1.2.1/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74=
|
||||
github.com/yuin/goldmark v1.4.13/go.mod h1:6yULJ656Px+3vBD8DxQVa3kxgyrAnzto9xy5taEt/CY=
|
||||
@@ -287,6 +302,8 @@ go.uber.org/tools v0.0.0-20190618225709-2cfd321de3ee/go.mod h1:vJERXedbb3MVM5f9E
|
||||
go.uber.org/zap v1.9.1/go.mod h1:vwi/ZaCAaUcBkycHslxD9B2zi4UTXhF60s6SWpuDF0Q=
|
||||
go.uber.org/zap v1.10.0/go.mod h1:vwi/ZaCAaUcBkycHslxD9B2zi4UTXhF60s6SWpuDF0Q=
|
||||
go.uber.org/zap v1.13.0/go.mod h1:zwrFLgMcdUuIBviXEYEH1YKNaOBnKXsx2IPda5bBwHM=
|
||||
golang.org/x/arch v0.0.0-20210923205945-b76863e36670 h1:18EFjUmQOcUvxNYSkA6jO9VAiXCnxFY6NyDX0bHDmkU=
|
||||
golang.org/x/arch v0.0.0-20210923205945-b76863e36670/go.mod h1:5om86z9Hs0C8fWVUuoMHwpExlXzs5Tkyp9hOrfG7pp8=
|
||||
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
|
||||
golang.org/x/crypto v0.0.0-20190411191339-88737f569e3a/go.mod h1:WFFai1msRO1wXaEeE5yQxYXgSfI8pQAWXbQop6sCtWE=
|
||||
golang.org/x/crypto v0.0.0-20190510104115-cbcb75029529/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI=
|
||||
@@ -297,11 +314,10 @@ golang.org/x/crypto v0.0.0-20201203163018-be400aefbc4c/go.mod h1:jdWPYTVW3xRLrWP
|
||||
golang.org/x/crypto v0.0.0-20210616213533-5ff15b29337e/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc=
|
||||
golang.org/x/crypto v0.0.0-20210711020723-a769d52b0f97/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc=
|
||||
golang.org/x/crypto v0.0.0-20210921155107-089bfa567519/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc=
|
||||
golang.org/x/crypto v0.0.0-20211215153901-e495a2d5b3d3/go.mod h1:IxCIyHEi3zRg3s0A5j5BB6A9Jmi73HwBIUl50j+osU4=
|
||||
golang.org/x/crypto v0.0.0-20220622213112-05595931fe9d/go.mod h1:IxCIyHEi3zRg3s0A5j5BB6A9Jmi73HwBIUl50j+osU4=
|
||||
golang.org/x/crypto v0.0.0-20220722155217-630584e8d5aa/go.mod h1:IxCIyHEi3zRg3s0A5j5BB6A9Jmi73HwBIUl50j+osU4=
|
||||
golang.org/x/crypto v0.1.0 h1:MDRAIl0xIo9Io2xV565hzXHw3zVseKrJKodhohM5CjU=
|
||||
golang.org/x/crypto v0.1.0/go.mod h1:RecgLatLF4+eUMCP1PoPZQb+cVrJcOPbHkTkbkB9sbw=
|
||||
golang.org/x/crypto v0.5.0 h1:U/0M97KRkSFvyD/3FSmdP5W5swImpNgle/EHFhOsQPE=
|
||||
golang.org/x/crypto v0.5.0/go.mod h1:NK/OQwhpMQP3MwtdjgLlYHnH9ebylxKWv3e0fK+mkQU=
|
||||
golang.org/x/exp v0.0.0-20190121172915-509febef88a4/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA=
|
||||
golang.org/x/lint v0.0.0-20181026193005-c67002cb31c3/go.mod h1:UVdnD1Gm6xHRNCYTkRU2/jEulfH38KcIWyp/GAMgvoE=
|
||||
golang.org/x/lint v0.0.0-20190227174305-5b3e6a55c961/go.mod h1:wehouNa3lNwaWXcvxsM5YxQ5yQlVC4a0KAMCusXpPoU=
|
||||
@@ -449,3 +465,4 @@ gorm.io/gorm v1.24.2/go.mod h1:DVrVomtaYTbqs7gB/x2uVvqnXzv0nqjB396B8cG4dBA=
|
||||
honnef.co/go/tools v0.0.0-20190102054323-c2f93a96b099/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4=
|
||||
honnef.co/go/tools v0.0.0-20190523083050-ea95bdfd59fc/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4=
|
||||
honnef.co/go/tools v0.0.1-2019.2.3/go.mod h1:a3bituU0lyd329TUQxRnasdCoJDkEUEAqEt0JzvZhAg=
|
||||
rsc.io/pdf v0.1.1/go.mod h1:n8OzWcQ6Sp37PL01nO98y4iUCRdTGarVfzxY20ICaU4=
|
||||
|
||||
@@ -1,41 +1,56 @@
|
||||
{
|
||||
"name": "TCP detection",
|
||||
"name": "TCP detection by UlricQin",
|
||||
"tags": "",
|
||||
"ident": "",
|
||||
"configs": {
|
||||
"panels": [
|
||||
{
|
||||
"collapsed": true,
|
||||
"id": "b90370ef-ee1c-40c3-a570-e26a89448209",
|
||||
"layout": {
|
||||
"h": 1,
|
||||
"i": "b90370ef-ee1c-40c3-a570-e26a89448209",
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 0
|
||||
},
|
||||
"name": "Default chart group",
|
||||
"type": "row"
|
||||
},
|
||||
{
|
||||
"custom": {
|
||||
"aggrDimension": "target",
|
||||
"calc": "lastNotNull",
|
||||
"displayMode": "labelValuesToRows",
|
||||
"showHeader": true
|
||||
},
|
||||
"type": "table",
|
||||
"id": "73c6eaf9-1685-4a7a-bf53-3d52afa1792e",
|
||||
"layout": {
|
||||
"h": 15,
|
||||
"i": "73c6eaf9-1685-4a7a-bf53-3d52afa1792e",
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 1
|
||||
"y": 0,
|
||||
"i": "73c6eaf9-1685-4a7a-bf53-3d52afa1792e",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "3.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${prom}",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "max(net_response_result_code) by (target)",
|
||||
"legend": "UP?",
|
||||
"refId": "A"
|
||||
},
|
||||
{
|
||||
"expr": "max(net_response_response_time) by (target) * 1000",
|
||||
"legend": "Latency(ms)",
|
||||
"refId": "C"
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {
|
||||
"indexByName": {
|
||||
"target": 0
|
||||
}
|
||||
}
|
||||
}
|
||||
],
|
||||
"name": "Targets",
|
||||
"custom": {
|
||||
"showHeader": true,
|
||||
"colorMode": "background",
|
||||
"calc": "lastNotNull",
|
||||
"displayMode": "labelValuesToRows",
|
||||
"aggrDimension": "target"
|
||||
},
|
||||
"options": {
|
||||
"standardOptions": {},
|
||||
"valueMappings": []
|
||||
"valueMappings": [],
|
||||
"standardOptions": {}
|
||||
},
|
||||
"overrides": [
|
||||
{
|
||||
@@ -50,7 +65,7 @@
|
||||
"special": 0
|
||||
},
|
||||
"result": {
|
||||
"color": "#417505",
|
||||
"color": "#2c9d3d",
|
||||
"text": "UP"
|
||||
},
|
||||
"type": "special"
|
||||
@@ -68,33 +83,49 @@
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
],
|
||||
"targets": [
|
||||
{
|
||||
"expr": "max(net_response_result_code) by (target)",
|
||||
"legend": "UP?",
|
||||
"refId": "A"
|
||||
},
|
||||
{
|
||||
"expr": "max(net_response_response_time) by (target)",
|
||||
"legend": "latency(s)",
|
||||
"refId": "C"
|
||||
"type": "special",
|
||||
"matcher": {
|
||||
"value": "C"
|
||||
},
|
||||
"properties": {
|
||||
"valueMappings": [
|
||||
{
|
||||
"type": "range",
|
||||
"result": {
|
||||
"color": "#f10c0c"
|
||||
},
|
||||
"match": {
|
||||
"from": 1
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "range",
|
||||
"result": {
|
||||
"color": "#2c9d3d"
|
||||
},
|
||||
"match": {
|
||||
"to": 1
|
||||
}
|
||||
}
|
||||
],
|
||||
"standardOptions": {
|
||||
"util": "milliseconds",
|
||||
"decimals": 3
|
||||
}
|
||||
}
|
||||
}
|
||||
],
|
||||
"type": "table",
|
||||
"version": "2.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${prom}"
|
||||
]
|
||||
}
|
||||
],
|
||||
"version": "3.0.0",
|
||||
"var": [
|
||||
{
|
||||
"definition": "prometheus",
|
||||
"name": "prom",
|
||||
"type": "datasource",
|
||||
"definition": "prometheus"
|
||||
"type": "datasource"
|
||||
}
|
||||
]
|
||||
],
|
||||
"version": "3.0.0"
|
||||
}
|
||||
}
|
||||
BIN
integrations/Network/icon/network.png
Normal file
|
After Width: | Height: | Size: 888 B |
10
integrations/SNMP/dashboards/placeholder.json
Normal file
@@ -0,0 +1,10 @@
|
||||
{
|
||||
"name": "占位的,等待老炮 PR",
|
||||
"tags": "",
|
||||
"ident": "",
|
||||
"configs": {
|
||||
"var": [],
|
||||
"panels": [],
|
||||
"version": "3.0.0"
|
||||
}
|
||||
}
|
||||
BIN
integrations/SNMP/icon/snmp.png
Normal file
|
After Width: | Height: | Size: 813 B |
56
integrations/SNMP/markdown/README.md
Normal file
@@ -0,0 +1,56 @@
|
||||
监控网络设备,主要是通过 SNMP 协议,Categraf、Telegraf、Datadog-Agent、snmp_exporter 都提供了这个能力。
|
||||
|
||||
## snmp
|
||||
|
||||
Categraf 从 v0.2.13 版本开始把 Telegraf 的 snmp 插件集成了进来,推荐大家采用这个插件来监控网络设备。这个插件的核心逻辑是:要采集什么指标,直接配置对应的 oid 即可,而且可以把一些 oid 采集到的数据当做时序数据的标签,非常非常灵活。
|
||||
|
||||
当然,弊端也有,因为 SNMP 体系里有大量的私有 oid,比如不同的设备获取 CPU、内存利用率的 oid 都不一样,这就需要为不同的型号的设备采用不同的配置,维护起来比较麻烦,需要大量的积累。这里我倡议大家把不同的设备型号的采集配置积累到 [这里](https://github.com/flashcatcloud/categraf/tree/main/inputs/snmp),每个型号一个文件夹,长期积累下来,那将是利人利己的好事。不知道如何提PR的可以联系我们。
|
||||
|
||||
另外,也不用太悲观,针对网络设备而言,大部分监控数据的采集都是通用 oid 就可以搞定的,举个例子:
|
||||
|
||||
```toml
|
||||
interval = 60
|
||||
|
||||
[[instances]]
|
||||
agents = ["udp://172.30.15.189:161"]
|
||||
|
||||
interval_times = 1
|
||||
timeout = "5s"
|
||||
version = 2
|
||||
community = "public"
|
||||
# agent_host_tag 设置为 ident,这个交换机就会当做监控对象出现在夜莺的监控对象列表里
|
||||
# 看大家的需要,我个人建议把 agent_host_tag 设置为 switch_ip
|
||||
agent_host_tag = "ident"
|
||||
retries = 1
|
||||
|
||||
[[instances.field]]
|
||||
oid = "RFC1213-MIB::sysUpTime.0"
|
||||
name = "uptime"
|
||||
|
||||
[[instances.field]]
|
||||
oid = "RFC1213-MIB::sysName.0"
|
||||
name = "source"
|
||||
is_tag = true
|
||||
|
||||
[[instances.table]]
|
||||
oid = "IF-MIB::ifTable"
|
||||
name = "interface"
|
||||
inherit_tags = ["source"]
|
||||
|
||||
[[instances.table.field]]
|
||||
oid = "IF-MIB::ifDescr"
|
||||
name = "ifDescr"
|
||||
is_tag = true
|
||||
|
||||
```
|
||||
|
||||
上面的样例是 v2 版本的配置,如果是 v3 版本,校验方式举例:
|
||||
|
||||
```toml
|
||||
version = 3
|
||||
sec_name = "managev3user"
|
||||
auth_protocol = "SHA"
|
||||
auth_password = "example.Demo.c0m"
|
||||
```
|
||||
|
||||
另外,snmp 的采集,建议大家部署单独的 Categraf 来做,因为不同监控对象采集频率可能不同,比如边缘交换机,我们 5min 采集一次就够了,核心交换机可以配置的频繁一些,比如 60s 或者 120s,如何调整采集频率呢?需要借助 interval 和 interval_times 等配置实现,具体可以参考《[讲解Categraf采集器](https://mp.weixin.qq.com/s/T69kkBzToHVh31D87xsrIg)》中的视频教程。
|
||||
1115
integrations/TDEngine/dashboards/tasokeeper3.x.json
Normal file
BIN
integrations/TDEngine/icon/taos.png
Normal file
|
After Width: | Height: | Size: 14 KiB |
614
integrations/canal/dashboards/canal_by_categraf.json
Normal file
@@ -0,0 +1,614 @@
|
||||
{
|
||||
"name": "Canal instances",
|
||||
"tags": "",
|
||||
"configs": {
|
||||
"version": "2.0.0",
|
||||
"links": [],
|
||||
"var": [
|
||||
{
|
||||
"name": "destination",
|
||||
"allOption": false,
|
||||
"multi": false,
|
||||
"definition": "label_values(canal_instance, destination)"
|
||||
}
|
||||
],
|
||||
"panels": [
|
||||
{
|
||||
"version": "2.0.0",
|
||||
"id": "758ed076-0140-4755-bd86-da18d0648fdd",
|
||||
"type": "row",
|
||||
"name": "Instance status",
|
||||
"collapsed": true,
|
||||
"layout": {
|
||||
"h": 1,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 0,
|
||||
"i": "758ed076-0140-4755-bd86-da18d0648fdd"
|
||||
},
|
||||
"panels": []
|
||||
},
|
||||
{
|
||||
"version": "2.0.0",
|
||||
"id": "0c611d83-9ccb-402f-b3ed-14d53bd3e818",
|
||||
"type": "timeseries",
|
||||
"name": "Basic",
|
||||
"description": "Canal instance <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD>Ϣ<EFBFBD><CFA2>",
|
||||
"links": [],
|
||||
"layout": {
|
||||
"h": 5,
|
||||
"w": 6,
|
||||
"x": 0,
|
||||
"y": 1,
|
||||
"i": "0c611d83-9ccb-402f-b3ed-14d53bd3e818"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "canal_instance{destination=~\"$destination\"}",
|
||||
"legend": "Destination: {{destination}}"
|
||||
},
|
||||
{
|
||||
"refId": "B",
|
||||
"expr": "canal_instance_parser_mode{destination=~\"$destination\"}",
|
||||
"legend": "Parallel parser: {{parallel}}"
|
||||
},
|
||||
{
|
||||
"refId": "C",
|
||||
"expr": "canal_instance_store{destination=~\"$destination\"}",
|
||||
"legend": "Batch mode: {{batchMode}}"
|
||||
},
|
||||
{
|
||||
"refId": "D",
|
||||
"expr": "canal_instance_store{destination=~\"$destination\"}",
|
||||
"legend": "Buffer size: {{size}}"
|
||||
}
|
||||
],
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"version": "2.0.0",
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "linear",
|
||||
"fillOpacity": 0.5,
|
||||
"stack": "off"
|
||||
}
|
||||
},
|
||||
{
|
||||
"version": "2.0.0",
|
||||
"id": "efde62a5-f4ac-4062-80d1-4cc7dd50bb9f",
|
||||
"type": "timeseries",
|
||||
"name": "Network bandwith",
|
||||
"description": "Canal instance <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>ռ<EFBFBD>á<EFBFBD>\ninbound: <20><>ȡMySQL binlog.\noutbound: <20><>Client<6E>˴<EFBFBD><CBB4><EFBFBD><EFBFBD>ʽ<EFBFBD><CABD>binlog.",
|
||||
"links": [],
|
||||
"layout": {
|
||||
"h": 5,
|
||||
"w": 6,
|
||||
"x": 6,
|
||||
"y": 1,
|
||||
"i": "efde62a5-f4ac-4062-80d1-4cc7dd50bb9f"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "rate(canal_instance_received_binlog_bytes{destination=~\"$destination\", parser=\"0\"}[2m]) / 1024",
|
||||
"legend": "inbound"
|
||||
},
|
||||
{
|
||||
"refId": "B",
|
||||
"expr": "rate(canal_instance_client_bytes{destination=~\"$destination\"}[2m]) / 1024",
|
||||
"legend": "outbound"
|
||||
},
|
||||
{
|
||||
"refId": "C",
|
||||
"expr": "rate(canal_instance_received_binlog_bytes{destination=~\"$destination\", parser=\"1\"}[2m]) / 1024",
|
||||
"legend": "inbound-1"
|
||||
},
|
||||
{
|
||||
"refId": "D",
|
||||
"expr": "rate(canal_instance_received_binlog_bytes{destination=~\"$destination\", parser=\"2\"}[2m]) / 1024",
|
||||
"legend": "inbound-2"
|
||||
}
|
||||
],
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"version": "2.0.0",
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "linear",
|
||||
"fillOpacity": 0.5,
|
||||
"stack": "off"
|
||||
}
|
||||
},
|
||||
{
|
||||
"version": "2.0.0",
|
||||
"id": "93d407a9-c1bf-4f9c-b88b-f0a01c023ea4",
|
||||
"type": "timeseries",
|
||||
"name": "Delay",
|
||||
"description": "master: Canal server<65><72><EFBFBD><EFBFBD><EFBFBD>MySQL master<65><72><EFBFBD><EFBFBD>ʱ<EFBFBD><CAB1>ͨ<EFBFBD><CDA8>master heartbeat<61><74><EFBFBD><EFBFBD><EFBFBD><EFBFBD>ˢ<EFBFBD><CBA2>idle״̬<D7B4>µ<EFBFBD><C2B5><EFBFBD>ʱ<EFBFBD><CAB1>\nput: store put<75><74><EFBFBD><EFBFBD><EFBFBD><EFBFBD>ʱ<EFBFBD><CAB1><EFBFBD>Ϊ<EFBFBD><CEAA><EFBFBD><D7BC>\nget: client get<65><74><EFBFBD><EFBFBD><EFBFBD><EFBFBD>ʱ<EFBFBD><CAB1><EFBFBD>Ϊ<EFBFBD><CEAA><EFBFBD><D7BC>\nack: client ack<63><6B><EFBFBD><EFBFBD><EFBFBD><EFBFBD>ʱ<EFBFBD><CAB1><EFBFBD>Ϊ<EFBFBD><CEAA><EFBFBD><D7BC>",
|
||||
"links": [],
|
||||
"layout": {
|
||||
"h": 5,
|
||||
"w": 6,
|
||||
"x": 12,
|
||||
"y": 1,
|
||||
"i": "93d407a9-c1bf-4f9c-b88b-f0a01c023ea4"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"refId": "D",
|
||||
"expr": "canal_instance_traffic_delay{destination=~\"$destination\"} / 1000",
|
||||
"legend": "master"
|
||||
},
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "canal_instance_put_delay{destination=~\"$destination\"} / 1000",
|
||||
"legend": "put"
|
||||
},
|
||||
{
|
||||
"refId": "B",
|
||||
"expr": "canal_instance_get_delay{destination=~\"$destination\"} / 1000",
|
||||
"legend": "get"
|
||||
},
|
||||
{
|
||||
"refId": "C",
|
||||
"expr": "canal_instance_ack_delay{destination=~\"$destination\"} / 1000",
|
||||
"legend": "ack"
|
||||
}
|
||||
],
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"version": "2.0.0",
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "linear",
|
||||
"fillOpacity": 0.5,
|
||||
"stack": "off"
|
||||
}
|
||||
},
|
||||
{
|
||||
"version": "2.0.0",
|
||||
"id": "131cbcbe-29e7-469a-bb17-5914a8471ee7",
|
||||
"type": "timeseries",
|
||||
"name": "Blocking",
|
||||
"description": "sink<6E>߳<EFBFBD>blockingռ<67>ȣ<EFBFBD>dump<6D>߳<EFBFBD>blockingռ<67><D5BC>(<28><>parallel mode)<29><>",
|
||||
"links": [],
|
||||
"layout": {
|
||||
"h": 5,
|
||||
"w": 6,
|
||||
"x": 18,
|
||||
"y": 1,
|
||||
"i": "131cbcbe-29e7-469a-bb17-5914a8471ee7"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"refId": "B",
|
||||
"expr": "clamp_max(rate(canal_instance_publish_blocking_time{destination=~\"$destination\", parser=\"0\"}[2m]), 1000) / 10",
|
||||
"legend": "dump"
|
||||
},
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "clamp_max(rate(canal_instance_sink_blocking_time{destination=~\"$destination\"}[2m]), 1000) / 10",
|
||||
"legend": "sink"
|
||||
},
|
||||
{
|
||||
"refId": "C",
|
||||
"expr": "clamp_max(rate(canal_instance_publish_blocking_time{destination=~\"$destination\", parser=\"1\"}[2m]), 1000) / 10",
|
||||
"legend": "dump-1"
|
||||
},
|
||||
{
|
||||
"refId": "D",
|
||||
"expr": "clamp_max(rate(canal_instance_publish_blocking_time{destination=~\"$destination\", parser=\"2\"}[2m]), 1000) / 10",
|
||||
"legend": "dump-2"
|
||||
}
|
||||
],
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"version": "2.0.0",
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "linear",
|
||||
"fillOpacity": 0.5,
|
||||
"stack": "off"
|
||||
}
|
||||
},
|
||||
{
|
||||
"version": "2.0.0",
|
||||
"id": "de248e75-37cb-4536-874c-fbdd61a4a6a4",
|
||||
"type": "row",
|
||||
"name": "Throughput",
|
||||
"collapsed": true,
|
||||
"layout": {
|
||||
"h": 1,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 6,
|
||||
"i": "de248e75-37cb-4536-874c-fbdd61a4a6a4"
|
||||
},
|
||||
"panels": []
|
||||
},
|
||||
{
|
||||
"version": "2.0.0",
|
||||
"id": "16e7b311-3e9e-4c17-874e-4ef3beb8779f",
|
||||
"type": "timeseries",
|
||||
"name": "TPS(table rows)",
|
||||
"description": "Instance<63><65><EFBFBD><EFBFBD>binlog<6F><67>TPS(<28><>master<65><72><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>table rowsΪ<73><CEAA><EFBFBD><D7BC><EFBFBD><EFBFBD>)<29><>\nput: put<75><74><EFBFBD><EFBFBD>TPS<50><53>\nget: get<65><74><EFBFBD><EFBFBD>TPS<50><53>\nack: ack<63><6B><EFBFBD><EFBFBD>TPS<50><53>",
|
||||
"links": [],
|
||||
"layout": {
|
||||
"h": 5,
|
||||
"w": 6,
|
||||
"x": 0,
|
||||
"y": 7,
|
||||
"i": "16e7b311-3e9e-4c17-874e-4ef3beb8779f"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "rate(canal_instance_put_rows{destination=~\"$destination\"}[2m])",
|
||||
"legend": "put"
|
||||
},
|
||||
{
|
||||
"refId": "B",
|
||||
"expr": "rate(canal_instance_get_rows{destination=~\"$destination\"}[2m])",
|
||||
"legend": "get"
|
||||
},
|
||||
{
|
||||
"refId": "C",
|
||||
"expr": "rate(canal_instance_ack_rows{destination=~\"$destination\"}[2m])",
|
||||
"legend": "ack"
|
||||
}
|
||||
],
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"version": "2.0.0",
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "linear",
|
||||
"fillOpacity": 0.5,
|
||||
"stack": "off"
|
||||
}
|
||||
},
|
||||
{
|
||||
"version": "2.0.0",
|
||||
"id": "791852f6-dad5-43fd-8629-3f84f8b4ae85",
|
||||
"type": "timeseries",
|
||||
"name": "TPS(MySQL transaction)",
|
||||
"description": "Canal instance <20><><EFBFBD><EFBFBD>binlog<6F><67>TPS<50><53><EFBFBD><EFBFBD>MySQL transactionΪ<6E><CEAA>λ<EFBFBD><CEBB><EFBFBD>㡣",
|
||||
"links": [],
|
||||
"layout": {
|
||||
"h": 5,
|
||||
"w": 6,
|
||||
"x": 6,
|
||||
"y": 7,
|
||||
"i": "791852f6-dad5-43fd-8629-3f84f8b4ae85"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "rate(canal_instance_transactions{destination=~\"$destination\"}[2m])",
|
||||
"legend": "transactions"
|
||||
}
|
||||
],
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"version": "2.0.0",
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "linear",
|
||||
"fillOpacity": 0.5,
|
||||
"stack": "off"
|
||||
}
|
||||
},
|
||||
{
|
||||
"version": "2.0.0",
|
||||
"id": "3de66e9d-a9ad-41a9-8a48-8911e382e1fe",
|
||||
"type": "row",
|
||||
"name": "Client",
|
||||
"collapsed": true,
|
||||
"layout": {
|
||||
"h": 1,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 12,
|
||||
"i": "3de66e9d-a9ad-41a9-8a48-8911e382e1fe"
|
||||
},
|
||||
"panels": []
|
||||
},
|
||||
{
|
||||
"version": "2.0.0",
|
||||
"id": "74c61ee7-2c28-4ee4-997d-e9a4bd9e7183",
|
||||
"type": "timeseries",
|
||||
"name": "Client requests",
|
||||
"description": "Canal instance<63><65><EFBFBD>յ<EFBFBD><D5B5><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>ͳ<EFBFBD>ƣ<EFBFBD><C6A3><EFBFBD><EFBFBD><EFBFBD><EFBFBD>packet type<70><65><EFBFBD>ࡣ",
|
||||
"links": [],
|
||||
"layout": {
|
||||
"h": 5,
|
||||
"w": 6,
|
||||
"x": 0,
|
||||
"y": 13,
|
||||
"i": "74c61ee7-2c28-4ee4-997d-e9a4bd9e7183"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "canal_instance_client_packets{destination=~\"$destination\"}",
|
||||
"legend": "{{packetType}}"
|
||||
}
|
||||
],
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"version": "2.0.0",
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "linear",
|
||||
"fillOpacity": 0.5,
|
||||
"stack": "off"
|
||||
}
|
||||
},
|
||||
{
|
||||
"version": "2.0.0",
|
||||
"id": "037002c9-dda5-4ce2-b8a3-d14e8ef9440b",
|
||||
"type": "timeseries",
|
||||
"name": "Client QPS",
|
||||
"description": "client <20><><EFBFBD><EFBFBD><EFBFBD>GET<45><54>ACK<43><4B><EFBFBD><EFBFBD>QPS<50><53>",
|
||||
"links": [],
|
||||
"layout": {
|
||||
"h": 5,
|
||||
"w": 6,
|
||||
"x": 6,
|
||||
"y": 13,
|
||||
"i": "037002c9-dda5-4ce2-b8a3-d14e8ef9440b"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "rate(canal_instance_client_packets{destination=~\"$destination\",packetType=\"GET\"}[2m])",
|
||||
"legend": "GET"
|
||||
},
|
||||
{
|
||||
"refId": "B",
|
||||
"expr": "rate(canal_instance_client_packets{destination=~\"$destination\",packetType=\"CLIENTACK\"}[2m])",
|
||||
"legend": "ACK"
|
||||
}
|
||||
],
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"version": "2.0.0",
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "linear",
|
||||
"fillOpacity": 0.5,
|
||||
"stack": "off"
|
||||
}
|
||||
},
|
||||
{
|
||||
"version": "2.0.0",
|
||||
"id": "2a433709-f08d-404a-acd6-3ef295157b83",
|
||||
"type": "timeseries",
|
||||
"name": "Empty packets",
|
||||
"description": "server<65><72>ӦGET<45><54><EFBFBD><EFBFBD><F3A3ACB5><EFBFBD><EFBFBD>ؿհ<D8BF><D5B0><EFBFBD>ռ<EFBFBD>ȡ<EFBFBD>",
|
||||
"links": [],
|
||||
"layout": {
|
||||
"h": 5,
|
||||
"w": 6,
|
||||
"x": 12,
|
||||
"y": 13,
|
||||
"i": "2a433709-f08d-404a-acd6-3ef295157b83"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "rate(canal_instance_client_empty_batches{destination=~\"$destination\"}[2m])",
|
||||
"legend": "empty"
|
||||
},
|
||||
{
|
||||
"refId": "B",
|
||||
"expr": "rate(canal_instance_client_packets{destination=~\"$destination\", packetType=\"GET\"}[2m])",
|
||||
"legend": "nonempty"
|
||||
}
|
||||
],
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"version": "2.0.0",
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "linear",
|
||||
"fillOpacity": 0.5,
|
||||
"stack": "off"
|
||||
}
|
||||
},
|
||||
{
|
||||
"version": "2.0.0",
|
||||
"id": "1c262e9e-1dc5-4aae-bf8c-0e4c3e85a84f",
|
||||
"type": "timeseries",
|
||||
"name": "Response time",
|
||||
"description": "Canal client <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD>Ӧʱ<D3A6><CAB1>ĸſ<C4B8><C5BF><EFBFBD>",
|
||||
"links": [],
|
||||
"layout": {
|
||||
"h": 5,
|
||||
"w": 6,
|
||||
"x": 18,
|
||||
"y": 13,
|
||||
"i": "1c262e9e-1dc5-4aae-bf8c-0e4c3e85a84f"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "rate(canal_instance_client_request_latency_bucket{destination=~\"$destination\"}[2m])",
|
||||
"legend": "{{le}}ms"
|
||||
}
|
||||
],
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"version": "2.0.0",
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "linear",
|
||||
"fillOpacity": 0.5,
|
||||
"stack": "off"
|
||||
}
|
||||
},
|
||||
{
|
||||
"version": "2.0.0",
|
||||
"id": "ae68fd29-4dd4-445a-86b9-76144d23d27c",
|
||||
"type": "row",
|
||||
"name": "Store",
|
||||
"collapsed": true,
|
||||
"layout": {
|
||||
"h": 1,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 18,
|
||||
"i": "ae68fd29-4dd4-445a-86b9-76144d23d27c"
|
||||
},
|
||||
"panels": []
|
||||
},
|
||||
{
|
||||
"version": "2.0.0",
|
||||
"id": "27b7365c-2bca-42b7-836d-33dc769b2a4e",
|
||||
"type": "timeseries",
|
||||
"name": "Store remain events",
|
||||
"description": "Canal instance ringbuffer<65><72>δ<EFBFBD>ͷŵ<CDB7>events<74><73><EFBFBD><EFBFBD><EFBFBD><EFBFBD>",
|
||||
"links": [],
|
||||
"layout": {
|
||||
"h": 5,
|
||||
"w": 6,
|
||||
"x": 0,
|
||||
"y": 19,
|
||||
"i": "27b7365c-2bca-42b7-836d-33dc769b2a4e"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "canal_instance_store_produce_seq{destination=~\"$destination\"} - canal_instance_store_consume_seq{destination=~\"$destination\"}",
|
||||
"legend": "events"
|
||||
}
|
||||
],
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"version": "2.0.0",
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "linear",
|
||||
"fillOpacity": 0.5,
|
||||
"stack": "off"
|
||||
}
|
||||
},
|
||||
{
|
||||
"version": "2.0.0",
|
||||
"id": "f399e86f-8f87-43fe-80a2-5b6f2ef7529f",
|
||||
"type": "timeseries",
|
||||
"name": "Store remain mem",
|
||||
"description": "Canal instance ringbuffer <20><>δ<EFBFBD>ͷ<EFBFBD>eventsռ<73><D5BC><EFBFBD>ڴ档",
|
||||
"links": [],
|
||||
"layout": {
|
||||
"h": 5,
|
||||
"w": 6,
|
||||
"x": 6,
|
||||
"y": 19,
|
||||
"i": "f399e86f-8f87-43fe-80a2-5b6f2ef7529f"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "(canal_instance_store_produce_mem{destination=~\"$destination\"} - canal_instance_store_consume_mem{destination=~\"$destination\"}) / 1024",
|
||||
"legend": "memsize"
|
||||
}
|
||||
],
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"version": "2.0.0",
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "linear",
|
||||
"fillOpacity": 0.5,
|
||||
"stack": "off"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
BIN
integrations/canal/icon/canal.png
Normal file
|
After Width: | Height: | Size: 3.1 KiB |
@@ -1,5 +1,5 @@
|
||||
{
|
||||
"name": "ElasticSearch",
|
||||
"name": "ElasticSearch By Exporter",
|
||||
"tags": "Prometheus ElasticSearch ES",
|
||||
"ident": "",
|
||||
"configs": {
|
||||
|
||||
@@ -1,41 +1,68 @@
|
||||
{
|
||||
"name": "http detect",
|
||||
"name": "HTTP detect by UlricQin",
|
||||
"tags": "",
|
||||
"ident": "",
|
||||
"configs": {
|
||||
"panels": [
|
||||
{
|
||||
"collapsed": true,
|
||||
"id": "0cd7c8aa-456c-4522-97ef-0b1710e7af8a",
|
||||
"layout": {
|
||||
"h": 1,
|
||||
"i": "0cd7c8aa-456c-4522-97ef-0b1710e7af8a",
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 0
|
||||
},
|
||||
"name": "Default chart group",
|
||||
"type": "row"
|
||||
},
|
||||
{
|
||||
"custom": {
|
||||
"aggrDimension": "target",
|
||||
"calc": "lastNotNull",
|
||||
"displayMode": "labelValuesToRows",
|
||||
"showHeader": true
|
||||
},
|
||||
"type": "table",
|
||||
"id": "3674dbfa-243a-49f6-baa5-b7f887c1afb0",
|
||||
"layout": {
|
||||
"h": 15,
|
||||
"i": "3674dbfa-243a-49f6-baa5-b7f887c1afb0",
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 1
|
||||
"y": 0,
|
||||
"i": "3674dbfa-243a-49f6-baa5-b7f887c1afb0",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "3.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${Datasource}",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "max(http_response_result_code) by (target)",
|
||||
"legend": "UP?",
|
||||
"refId": "A",
|
||||
"instant": true
|
||||
},
|
||||
{
|
||||
"expr": "max(http_response_response_code) by (target)",
|
||||
"legend": "status code",
|
||||
"refId": "B",
|
||||
"instant": true
|
||||
},
|
||||
{
|
||||
"expr": "max(http_response_response_time) by (target) *1000",
|
||||
"legend": "latency",
|
||||
"refId": "C",
|
||||
"instant": true
|
||||
},
|
||||
{
|
||||
"expr": "max(http_response_cert_expire_timestamp) by (target) - time()",
|
||||
"legend": "cert expire",
|
||||
"refId": "D",
|
||||
"instant": true
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "URL Details",
|
||||
"custom": {
|
||||
"showHeader": true,
|
||||
"colorMode": "background",
|
||||
"calc": "lastNotNull",
|
||||
"displayMode": "labelValuesToRows",
|
||||
"aggrDimension": "target",
|
||||
"sortColumn": "target",
|
||||
"sortOrder": "ascend"
|
||||
},
|
||||
"options": {
|
||||
"standardOptions": {},
|
||||
"valueMappings": []
|
||||
"valueMappings": [],
|
||||
"standardOptions": {}
|
||||
},
|
||||
"overrides": [
|
||||
{
|
||||
@@ -76,43 +103,115 @@
|
||||
"properties": {
|
||||
"standardOptions": {
|
||||
"util": "humantimeSeconds"
|
||||
}
|
||||
},
|
||||
"valueMappings": [
|
||||
{
|
||||
"type": "range",
|
||||
"result": {
|
||||
"color": "#f60c0c"
|
||||
},
|
||||
"match": {
|
||||
"to": 604800
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "range",
|
||||
"result": {
|
||||
"color": "#ffae39"
|
||||
},
|
||||
"match": {
|
||||
"to": 2592000
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
"type": "special"
|
||||
},
|
||||
{
|
||||
"type": "special",
|
||||
"matcher": {
|
||||
"value": "B"
|
||||
},
|
||||
"properties": {
|
||||
"valueMappings": [
|
||||
{
|
||||
"type": "range",
|
||||
"result": {
|
||||
"color": "#2c9d3d"
|
||||
},
|
||||
"match": {
|
||||
"to": 399
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "range",
|
||||
"result": {
|
||||
"color": "#ff656b"
|
||||
},
|
||||
"match": {
|
||||
"to": 499
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "range",
|
||||
"result": {
|
||||
"color": "#f10808"
|
||||
},
|
||||
"match": {
|
||||
"from": 500
|
||||
}
|
||||
}
|
||||
],
|
||||
"standardOptions": {}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "special",
|
||||
"matcher": {
|
||||
"value": "C"
|
||||
},
|
||||
"properties": {
|
||||
"valueMappings": [
|
||||
{
|
||||
"type": "range",
|
||||
"result": {
|
||||
"color": "#2c9d3d"
|
||||
},
|
||||
"match": {
|
||||
"to": 400
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "range",
|
||||
"result": {
|
||||
"color": "#ff656b"
|
||||
},
|
||||
"match": {
|
||||
"from": 400
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "range",
|
||||
"result": {
|
||||
"color": "#f11313"
|
||||
},
|
||||
"match": {
|
||||
"from": 2000
|
||||
}
|
||||
}
|
||||
],
|
||||
"standardOptions": {
|
||||
"util": "milliseconds"
|
||||
}
|
||||
}
|
||||
}
|
||||
],
|
||||
"targets": [
|
||||
{
|
||||
"expr": "max(http_response_result_code) by (target)",
|
||||
"legend": "UP?",
|
||||
"refId": "A"
|
||||
},
|
||||
{
|
||||
"expr": "max(http_response_response_code) by (target)",
|
||||
"legend": "status code",
|
||||
"refId": "B"
|
||||
},
|
||||
{
|
||||
"expr": "max(http_response_response_time) by (target)",
|
||||
"legend": "latency(s)",
|
||||
"refId": "C"
|
||||
},
|
||||
{
|
||||
"expr": "max(http_response_cert_expire_timestamp) by (target) - time()",
|
||||
"legend": "cert expire",
|
||||
"refId": "D"
|
||||
}
|
||||
],
|
||||
"type": "table",
|
||||
"version": "2.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${prom}"
|
||||
]
|
||||
}
|
||||
],
|
||||
"version": "3.0.0",
|
||||
"var": [
|
||||
{
|
||||
"name": "prom",
|
||||
"name": "Datasource",
|
||||
"type": "datasource",
|
||||
"definition": "prometheus"
|
||||
}
|
||||
|
||||
|
Before Width: | Height: | Size: 1.4 KiB |
BIN
integrations/http/icon/http_response.png
Normal file
|
After Width: | Height: | Size: 975 B |
266
integrations/kubernetes/alerts/apiserver.json
Normal file
@@ -0,0 +1,266 @@
|
||||
[
|
||||
{
|
||||
"name": "KubeClientCertificateExpiration-S2",
|
||||
"note": "A client certificate used to authenticate to the apiserver is expiring in less than 7.0 days.",
|
||||
"severity": 2,
|
||||
"disabled": 0,
|
||||
"prom_for_duration": 0,
|
||||
"prom_ql": "apiserver_client_certificate_expiration_seconds_count{job=\"apiserver\"} > 0 and on(job) histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job=\"apiserver\"}[5m]))) < 604800\n",
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_etime": "23:59",
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_repeat_step": 60,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": []
|
||||
},
|
||||
{
|
||||
"name": "KubeClientCertificateExpiration-S1",
|
||||
"note": "A client certificate used to authenticate to the apiserver is expiring in less than 24.0 hours.",
|
||||
"severity": 1,
|
||||
"disabled": 0,
|
||||
"prom_for_duration": 0,
|
||||
"prom_ql": "apiserver_client_certificate_expiration_seconds_count{job=\"apiserver\"} > 0 and on(job) histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job=\"apiserver\"}[5m]))) < 86400\n",
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_etime": "23:59",
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_repeat_step": 60,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": []
|
||||
},
|
||||
{
|
||||
"name": "AggregatedAPIErrors",
|
||||
"note": "An aggregated API {{ $labels.name }}/{{ $labels.namespace }} has reported errors. The number of errors have increased for it in the past five minutes. High values indicate that the availability of the service changes too often.",
|
||||
"severity": 2,
|
||||
"disabled": 0,
|
||||
"prom_for_duration": 0,
|
||||
"prom_ql": "sum by(name, namespace)(increase(aggregator_unavailable_apiservice_count[5m])) > 2\n",
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_etime": "23:59",
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_repeat_step": 60,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": []
|
||||
},
|
||||
{
|
||||
"name": "AggregatedAPIDown",
|
||||
"note": "An aggregated API {{ $labels.name }}/{{ $labels.namespace }} has been only {{ $value | humanize }}% available over the last 10m.",
|
||||
"severity": 2,
|
||||
"disabled": 0,
|
||||
"prom_for_duration": 300,
|
||||
"prom_ql": "(1 - max by(name, namespace)(avg_over_time(aggregator_unavailable_apiservice[10m]))) * 100 < 85\n",
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_etime": "23:59",
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_repeat_step": 60,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": []
|
||||
},
|
||||
{
|
||||
"name": "KubeAPIDown",
|
||||
"note": "KubeAPI has disappeared from Prometheus target discovery.",
|
||||
"severity": 1,
|
||||
"disabled": 0,
|
||||
"prom_for_duration": 900,
|
||||
"prom_ql": "absent(up{job=\"apiserver\"} == 1)\n",
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_etime": "23:59",
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_repeat_step": 60,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": []
|
||||
},
|
||||
{
|
||||
"name": "KubeAPIErrorBudgetBurn-S1-120秒",
|
||||
"note": "The API server is burning too much error budget.",
|
||||
"severity": 1,
|
||||
"disabled": 0,
|
||||
"prom_for_duration": 120,
|
||||
"prom_ql": "sum(apiserver_request:burnrate1h) > (14.40 * 0.01000)\nand\nsum(apiserver_request:burnrate5m) > (14.40 * 0.01000)\n",
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_etime": "23:59",
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_repeat_step": 60,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": [
|
||||
"long=1h",
|
||||
"short=5m"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "KubeAPIErrorBudgetBurn-S1-900秒",
|
||||
"note": "The API server is burning too much error budget.",
|
||||
"severity": 1,
|
||||
"disabled": 0,
|
||||
"prom_for_duration": 900,
|
||||
"prom_ql": "sum(apiserver_request:burnrate6h) > (6.00 * 0.01000)\nand\nsum(apiserver_request:burnrate30m) > (6.00 * 0.01000)\n",
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_etime": "23:59",
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_repeat_step": 60,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": [
|
||||
"long=6h",
|
||||
"short=30m"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "KubeAPIErrorBudgetBurn-S2-3600秒",
|
||||
"note": "The API server is burning too much error budget.",
|
||||
"severity": 2,
|
||||
"disabled": 0,
|
||||
"prom_for_duration": 3600,
|
||||
"prom_ql": "sum(apiserver_request:burnrate1d) > (3.00 * 0.01000)\nand\nsum(apiserver_request:burnrate2h) > (3.00 * 0.01000)\n",
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_etime": "23:59",
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_repeat_step": 60,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": [
|
||||
"long=1d",
|
||||
"short=2h"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "KubeAPIErrorBudgetBurn-S2-10800秒",
|
||||
"note": "The API server is burning too much error budget.",
|
||||
"severity": 2,
|
||||
"disabled": 0,
|
||||
"prom_for_duration": 10800,
|
||||
"prom_ql": "sum(apiserver_request:burnrate3d) > (1.00 * 0.01000)\nand\nsum(apiserver_request:burnrate6h) > (1.00 * 0.01000)\n",
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_etime": "23:59",
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_repeat_step": 60,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": [
|
||||
"long=3d",
|
||||
"short=6h"
|
||||
]
|
||||
}
|
||||
]
|
||||
366
integrations/kubernetes/alerts/kubelet.json
Normal file
@@ -0,0 +1,366 @@
|
||||
[
|
||||
{
|
||||
"name": "Node状态异常",
|
||||
"note": "{{ $labels.node }} has been unready for more than 15 minutes.",
|
||||
"severity": 2,
|
||||
"disabled": 0,
|
||||
"prom_for_duration": 900,
|
||||
"prom_ql": "kube_node_status_condition{job=\"kube-state-metrics\",condition=\"Ready\",status=\"true\"} == 0\n",
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_etime": "23:59",
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_repeat_step": 60,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": []
|
||||
},
|
||||
{
|
||||
"name": "Node不可达",
|
||||
"note": "{{ $labels.node }} is unreachable and some workloads may be rescheduled.",
|
||||
"severity": 2,
|
||||
"disabled": 0,
|
||||
"prom_for_duration": 900,
|
||||
"prom_ql": "(kube_node_spec_taint{job=\"kube-state-metrics\",key=\"node.kubernetes.io/unreachable\",effect=\"NoSchedule\"} unless ignoring(key,value) kube_node_spec_taint{job=\"kube-state-metrics\",key=~\"ToBeDeletedByClusterAutoscaler|cloud.google.com/impending-node-termination|aws-node-termination-handler/spot-itn\"}) == 1\n",
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_etime": "23:59",
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_repeat_step": 60,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": []
|
||||
},
|
||||
{
|
||||
"name": "Node运行太多Pod",
|
||||
"note": "Kubelet '{{ $labels.node }}' is running at {{ $value | humanizePercentage }} of its Pod capacity.",
|
||||
"severity": 2,
|
||||
"disabled": 0,
|
||||
"prom_for_duration": 900,
|
||||
"prom_ql": "count by(node) (\n (kube_pod_status_phase{job=\"kube-state-metrics\",phase=\"Running\"} == 1) * on(instance,pod,namespace,cluster) group_left(node) topk by(instance,pod,namespace,cluster) (1, kube_pod_info{job=\"kube-state-metrics\"})\n)\n/\nmax by(node) (\n kube_node_status_capacity_pods{job=\"kube-state-metrics\"} != 1\n) > 0.95\n",
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_etime": "23:59",
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_repeat_step": 60,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": []
|
||||
},
|
||||
{
|
||||
"name": "Node状态抖动",
|
||||
"note": "The readiness status of node {{ $labels.node }} has changed {{ $value }} times in the last 15 minutes.",
|
||||
"severity": 2,
|
||||
"disabled": 0,
|
||||
"prom_for_duration": 900,
|
||||
"prom_ql": "sum(changes(kube_node_status_condition{status=\"true\",condition=\"Ready\"}[15m])) by (node) > 2\n",
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_etime": "23:59",
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_repeat_step": 60,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": []
|
||||
},
|
||||
{
|
||||
"name": "PLEG耗时高",
|
||||
"note": "The Kubelet Pod Lifecycle Event Generator has a 99th percentile duration of {{ $value }} seconds on node {{ $labels.node }}.",
|
||||
"severity": 2,
|
||||
"disabled": 0,
|
||||
"prom_for_duration": 300,
|
||||
"prom_ql": "node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile{quantile=\"0.99\"} >= 10\n",
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_etime": "23:59",
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_repeat_step": 60,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": []
|
||||
},
|
||||
{
|
||||
"name": "Pod启动耗时高",
|
||||
"note": "Kubelet Pod startup 99th percentile latency is {{ $value }} seconds on node {{ $labels.node }}.",
|
||||
"severity": 2,
|
||||
"disabled": 0,
|
||||
"prom_for_duration": 900,
|
||||
"prom_ql": "histogram_quantile(0.99, sum(rate(kubelet_pod_worker_duration_seconds_bucket{job=\"kubelet\", metrics_path=\"/metrics\"}[5m])) by (instance, le)) * on(instance) group_left(node) kubelet_node_name{job=\"kubelet\", metrics_path=\"/metrics\"} > 60\n",
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_etime": "23:59",
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_repeat_step": 60,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": []
|
||||
},
|
||||
{
|
||||
"name": "客户端证书过期-S2",
|
||||
"note": "Client certificate for Kubelet on node {{ $labels.node }} expires in {{ $value | humanizeDuration }}.",
|
||||
"severity": 2,
|
||||
"disabled": 0,
|
||||
"prom_for_duration": 0,
|
||||
"prom_ql": "kubelet_certificate_manager_client_ttl_seconds < 604800\n",
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_etime": "23:59",
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_repeat_step": 60,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": []
|
||||
},
|
||||
{
|
||||
"name": "客户端证书过期-S1",
|
||||
"note": "Client certificate for Kubelet on node {{ $labels.node }} expires in {{ $value | humanizeDuration }}.",
|
||||
"severity": 1,
|
||||
"disabled": 0,
|
||||
"prom_for_duration": 0,
|
||||
"prom_ql": "kubelet_certificate_manager_client_ttl_seconds < 86400\n",
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_etime": "23:59",
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_repeat_step": 60,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": []
|
||||
},
|
||||
{
|
||||
"name": "服务端证书过期-S2",
|
||||
"note": "Server certificate for Kubelet on node {{ $labels.node }} expires in {{ $value | humanizeDuration }}.",
|
||||
"severity": 2,
|
||||
"disabled": 0,
|
||||
"prom_for_duration": 0,
|
||||
"prom_ql": "kubelet_certificate_manager_server_ttl_seconds < 604800\n",
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_etime": "23:59",
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_repeat_step": 60,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": []
|
||||
},
|
||||
{
|
||||
"name": "服务端证书过期-S1",
|
||||
"note": "Server certificate for Kubelet on node {{ $labels.node }} expires in {{ $value | humanizeDuration }}.",
|
||||
"severity": 1,
|
||||
"disabled": 0,
|
||||
"prom_for_duration": 0,
|
||||
"prom_ql": "kubelet_certificate_manager_server_ttl_seconds < 86400\n",
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_etime": "23:59",
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_repeat_step": 60,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": []
|
||||
},
|
||||
{
|
||||
"name": "客户端证书续签错误",
|
||||
"note": "Kubelet on node {{ $labels.node }} has failed to renew its client certificate ({{ $value | humanize }} errors in the last 5 minutes).",
|
||||
"severity": 2,
|
||||
"disabled": 0,
|
||||
"prom_for_duration": 900,
|
||||
"prom_ql": "increase(kubelet_certificate_manager_client_expiration_renew_errors[5m]) > 0\n",
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_etime": "23:59",
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_repeat_step": 60,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": []
|
||||
},
|
||||
{
|
||||
"name": "服务证书续签错误",
|
||||
"note": "Kubelet on node {{ $labels.node }} has failed to renew its server certificate ({{ $value | humanize }} errors in the last 5 minutes).",
|
||||
"severity": 2,
|
||||
"disabled": 0,
|
||||
"prom_for_duration": 900,
|
||||
"prom_ql": "increase(kubelet_server_expiration_renew_errors[5m]) > 0\n",
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_etime": "23:59",
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_repeat_step": 60,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": []
|
||||
},
|
||||
{
|
||||
"name": "kubelet故障",
|
||||
"note": "Kubelet has disappeared from Prometheus target discovery.",
|
||||
"severity": 1,
|
||||
"disabled": 0,
|
||||
"prom_for_duration": 900,
|
||||
"prom_ql": "absent(up{job=\"kubelet\"} == 1)\n",
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_etime": "23:59",
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_repeat_step": 60,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": []
|
||||
}
|
||||
]
|
||||
1010
integrations/kubernetes/alerts/node-exporter.json
Normal file
642
integrations/kubernetes/dashboards/APIServer.json
Normal file
@@ -0,0 +1,642 @@
|
||||
{
|
||||
"name": "Kubernetes / API Server",
|
||||
"tags": "Categraf",
|
||||
"ident": "",
|
||||
"configs": {
|
||||
"version": "2.0.0",
|
||||
"links": [],
|
||||
"var": [],
|
||||
"panels": [
|
||||
{
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "up{job=\"apiserver\"}",
|
||||
"legend": "{{ instance }}"
|
||||
}
|
||||
],
|
||||
"name": "API Server - Health Status",
|
||||
"links": [],
|
||||
"description": "apiserver的实例健康状态,0表示down,1表示up",
|
||||
"custom": {
|
||||
"showHeader": true,
|
||||
"colorMode": "value",
|
||||
"calc": "lastNotNull",
|
||||
"displayMode": "seriesToRows"
|
||||
},
|
||||
"options": {
|
||||
"valueMappings": [
|
||||
{
|
||||
"options": {
|
||||
"0": {
|
||||
"text": "DOWN"
|
||||
},
|
||||
"1": {
|
||||
"text": "UP"
|
||||
}
|
||||
},
|
||||
"type": "value"
|
||||
},
|
||||
{
|
||||
"type": "special",
|
||||
"result": {
|
||||
"color": "#3fc453",
|
||||
"text": "UP"
|
||||
},
|
||||
"match": {
|
||||
"special": 1
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "special",
|
||||
"result": {
|
||||
"color": "#f80202",
|
||||
"text": "DOWN"
|
||||
},
|
||||
"match": {
|
||||
"special": 0
|
||||
}
|
||||
}
|
||||
],
|
||||
"standardOptions": {
|
||||
"util": "none"
|
||||
}
|
||||
},
|
||||
"overrides": [
|
||||
{}
|
||||
],
|
||||
"version": "2.0.0",
|
||||
"type": "table",
|
||||
"layout": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 0,
|
||||
"i": "98f46bc1-c078-40f2-915c-f0836957bf2f",
|
||||
"isResizable": true
|
||||
},
|
||||
"id": "98f46bc1-c078-40f2-915c-f0836957bf2f"
|
||||
},
|
||||
{
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "apiserver_requested_deprecated_apis{job=\"apiserver\"}",
|
||||
"legend": ""
|
||||
}
|
||||
],
|
||||
"name": "Deprecated Kubernetes Resources",
|
||||
"links": [],
|
||||
"description": "当前版本apiserver使用,未来版本中要移除的资源",
|
||||
"custom": {
|
||||
"showHeader": true,
|
||||
"colorMode": "value",
|
||||
"calc": "lastNotNull",
|
||||
"displayMode": "labelsOfSeriesToRows",
|
||||
"columns": [
|
||||
"group",
|
||||
"version",
|
||||
"resource",
|
||||
"removed_release"
|
||||
],
|
||||
"sortOrder": "ascend"
|
||||
},
|
||||
"options": {
|
||||
"standardOptions": {}
|
||||
},
|
||||
"overrides": [
|
||||
{}
|
||||
],
|
||||
"version": "2.0.0",
|
||||
"type": "table",
|
||||
"layout": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 0,
|
||||
"i": "73beb13a-bd10-4a68-bb9e-5b9ab63da154",
|
||||
"isResizable": true
|
||||
},
|
||||
"id": "73beb13a-bd10-4a68-bb9e-5b9ab63da154"
|
||||
},
|
||||
{
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "sum by (instance,code) (rate(apiserver_request_total{job=\"apiserver\"}[5m]))",
|
||||
"legend": "{{ instance }} {{ code }}"
|
||||
}
|
||||
],
|
||||
"name": "API Server - HTTP Requests by code",
|
||||
"links": [],
|
||||
"description": "按照返回码分类统计apiserver请求数",
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "list",
|
||||
"placement": "bottom"
|
||||
},
|
||||
"standardOptions": {
|
||||
"util": "none"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": []
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"lineWidth": 1,
|
||||
"fillOpacity": 0.25,
|
||||
"gradientMode": "none",
|
||||
"stack": "off"
|
||||
},
|
||||
"version": "2.0.0",
|
||||
"type": "timeseries",
|
||||
"layout": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 8,
|
||||
"i": "1cfa42b1-9dcf-471c-90ff-8ffe656d4b11",
|
||||
"isResizable": true
|
||||
},
|
||||
"id": "1cfa42b1-9dcf-471c-90ff-8ffe656d4b11"
|
||||
},
|
||||
{
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "sum by (instance,verb) (rate(apiserver_request_total{job=\"apiserver\"}[5m]))",
|
||||
"legend": "{{ instance }} {{ verb }}"
|
||||
}
|
||||
],
|
||||
"name": "API Server - HTTP Requests by verb",
|
||||
"links": [],
|
||||
"description": "按照请求动作分类统计apiserver的请求数",
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "desc"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "list",
|
||||
"placement": "bottom"
|
||||
},
|
||||
"standardOptions": {
|
||||
"util": "none"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": []
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"lineWidth": 1,
|
||||
"fillOpacity": 0.25,
|
||||
"gradientMode": "none",
|
||||
"stack": "off"
|
||||
},
|
||||
"version": "2.0.0",
|
||||
"type": "timeseries",
|
||||
"layout": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 8,
|
||||
"i": "94def0cb-0b86-42f7-a4b2-dde714bbb918",
|
||||
"isResizable": true
|
||||
},
|
||||
"id": "94def0cb-0b86-42f7-a4b2-dde714bbb918"
|
||||
},
|
||||
{
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "apiserver_current_inflight_requests{job=\"apiserver\"}",
|
||||
"legend": "{{ instance }} {{ request_kind }}"
|
||||
}
|
||||
],
|
||||
"name": "API Server - Current Inflight Requests by kind",
|
||||
"links": [],
|
||||
"description": "当前并发请求apiserver的数量",
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "desc"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "list",
|
||||
"placement": "bottom"
|
||||
},
|
||||
"standardOptions": {
|
||||
"util": "none"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": []
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"lineWidth": 1,
|
||||
"fillOpacity": 0.25,
|
||||
"gradientMode": "none",
|
||||
"stack": "off"
|
||||
},
|
||||
"version": "2.0.0",
|
||||
"type": "timeseries",
|
||||
"layout": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 16,
|
||||
"i": "ce5a15ad-11c6-44a2-a071-be57009162e1",
|
||||
"isResizable": true
|
||||
},
|
||||
"id": "fb6266a3-3da0-4310-bfe8-c64a53db5db3"
|
||||
},
|
||||
{
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "histogram_quantile(0.9, sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\"}[5m])) by (instance,verb,le))*1000",
|
||||
"legend": "{{ instance }} {{ verb }}"
|
||||
}
|
||||
],
|
||||
"name": "API Server - HTTP Requets Latency by verb",
|
||||
"links": [],
|
||||
"description": "apiserver的响应延迟,按请求动作分类统计",
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "list",
|
||||
"placement": "bottom"
|
||||
},
|
||||
"standardOptions": {
|
||||
"util": "milliseconds"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": []
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"lineWidth": 1,
|
||||
"fillOpacity": 0.25,
|
||||
"gradientMode": "none",
|
||||
"stack": "off"
|
||||
},
|
||||
"version": "2.0.0",
|
||||
"type": "timeseries",
|
||||
"layout": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 16,
|
||||
"i": "045dca2d-d69b-47a7-b25e-656adb357e11",
|
||||
"isResizable": true
|
||||
},
|
||||
"id": "045dca2d-d69b-47a7-b25e-656adb357e11"
|
||||
},
|
||||
{
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "histogram_quantile(0.9, sum(rate(apiserver_request_duration_seconds_bucket {job=\"apiserver\",verb!=\"WATCH\"}[5m])) by (instance,le))*1000",
|
||||
"legend": "{{ instance }}"
|
||||
}
|
||||
],
|
||||
"name": "API Server - HTTP Requets Latency by instance",
|
||||
"links": [],
|
||||
"description": "apiserver的响应延迟(非watch请求)",
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "desc"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "list",
|
||||
"placement": "bottom"
|
||||
},
|
||||
"standardOptions": {
|
||||
"util": "milliseconds"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": []
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"lineWidth": 1,
|
||||
"fillOpacity": 0.25,
|
||||
"gradientMode": "none",
|
||||
"stack": "off"
|
||||
},
|
||||
"version": "2.0.0",
|
||||
"type": "timeseries",
|
||||
"layout": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 24,
|
||||
"i": "1e775704-9ee4-45ce-9d24-b49af89fb5c7",
|
||||
"isResizable": true
|
||||
},
|
||||
"id": "1e775704-9ee4-45ce-9d24-b49af89fb5c7"
|
||||
},
|
||||
{
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "sum by(instance,verb) (rate(apiserver_request_total{code=~\"5..\",job=\"apiserver\"}[5m]))\n / sum by(instance,verb) (rate(apiserver_request_total{job=\"apiserver\"}[5m]))",
|
||||
"legend": "{{ instance }} {{ verb }}"
|
||||
}
|
||||
],
|
||||
"name": "API Server - Errors by verb",
|
||||
"links": [],
|
||||
"description": "apiserver的5xx错误率,按请求动作分类统计",
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "desc"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "list",
|
||||
"placement": "bottom"
|
||||
},
|
||||
"standardOptions": {
|
||||
"util": "none"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": []
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"lineWidth": 1,
|
||||
"fillOpacity": 0.25,
|
||||
"gradientMode": "none",
|
||||
"stack": "off"
|
||||
},
|
||||
"version": "2.0.0",
|
||||
"type": "timeseries",
|
||||
"layout": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 24,
|
||||
"i": "1ca62e0b-72df-47d1-93ba-048ed49e9cb5",
|
||||
"isResizable": true
|
||||
},
|
||||
"id": "1ca62e0b-72df-47d1-93ba-048ed49e9cb5"
|
||||
},
|
||||
{
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "sum by(instance) (rate(apiserver_request_total{code=~\"5..\", job=\"apiserver\"}[5m]))\n / sum by(instance) (rate(apiserver_request_total{job=\"apiserver\"}[5m]))",
|
||||
"legend": "{{ instance }}"
|
||||
}
|
||||
],
|
||||
"name": "API Server - Errors by Instance",
|
||||
"links": [],
|
||||
"description": "apiserver的5xx 错误率(5xx请求数/总请求数)",
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "desc"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "list",
|
||||
"placement": "bottom"
|
||||
},
|
||||
"standardOptions": {
|
||||
"util": "none"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": []
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"lineWidth": 1,
|
||||
"fillOpacity": 0.25,
|
||||
"gradientMode": "none",
|
||||
"stack": "off"
|
||||
},
|
||||
"version": "2.0.0",
|
||||
"type": "timeseries",
|
||||
"layout": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 32,
|
||||
"i": "92a209a1-7d30-4627-9ae1-55ded5095ed7",
|
||||
"isResizable": true
|
||||
},
|
||||
"id": "92a209a1-7d30-4627-9ae1-55ded5095ed7"
|
||||
},
|
||||
{
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "sum(rate(workqueue_depth{job=\"apiserver\"}[5m])) by (instance,name)",
|
||||
"legend": "{{ instance }} {{ name }}"
|
||||
}
|
||||
],
|
||||
"name": "API Server - Work Queue by instance",
|
||||
"links": [],
|
||||
"description": "apiserver工作队列深度,越接近0越好",
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "desc"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "list",
|
||||
"placement": "bottom"
|
||||
},
|
||||
"standardOptions": {
|
||||
"util": "none"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": []
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"lineWidth": 1,
|
||||
"fillOpacity": 0.25,
|
||||
"gradientMode": "none",
|
||||
"stack": "off"
|
||||
},
|
||||
"version": "2.0.0",
|
||||
"type": "timeseries",
|
||||
"layout": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 32,
|
||||
"i": "83f22cf4-9c65-4ad3-900b-fa6fc914dd88",
|
||||
"isResizable": true
|
||||
},
|
||||
"id": "83f22cf4-9c65-4ad3-900b-fa6fc914dd88"
|
||||
},
|
||||
{
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "sum(rate(apiserver_request_total{job=\"apiserver\"}[5m])) by (instance)",
|
||||
"legend": "{{ instance }}"
|
||||
}
|
||||
],
|
||||
"name": "API Server - HTTP Requests by instance",
|
||||
"links": [],
|
||||
"description": "5分钟内apiserver的请求数统计",
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "desc"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "list",
|
||||
"placement": "bottom"
|
||||
},
|
||||
"standardOptions": {
|
||||
"util": "none"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": []
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"lineWidth": 1,
|
||||
"fillOpacity": 0.25,
|
||||
"gradientMode": "none",
|
||||
"stack": "normal"
|
||||
},
|
||||
"version": "2.0.0",
|
||||
"type": "timeseries",
|
||||
"layout": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 40,
|
||||
"i": "3e9f9df7-d9fb-4791-b3b2-2c52678f060f",
|
||||
"isResizable": true
|
||||
},
|
||||
"id": "3e9f9df7-d9fb-4791-b3b2-2c52678f060f"
|
||||
},
|
||||
{
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "rate(process_cpu_seconds_total{job=\"apiserver\"}[5m])",
|
||||
"legend": "{{ instance }}"
|
||||
}
|
||||
],
|
||||
"name": "API Server - CPU Usage by instance",
|
||||
"links": [],
|
||||
"description": "apiserver的cpu使用率",
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "desc"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "list",
|
||||
"placement": "bottom"
|
||||
},
|
||||
"standardOptions": {
|
||||
"util": "percent",
|
||||
"decimals": 2
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": []
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"lineWidth": 1,
|
||||
"fillOpacity": 0.25,
|
||||
"gradientMode": "none",
|
||||
"stack": "off"
|
||||
},
|
||||
"version": "2.0.0",
|
||||
"type": "timeseries",
|
||||
"layout": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 40,
|
||||
"i": "3d5c1ae5-e640-4986-9202-78258169bffb",
|
||||
"isResizable": true
|
||||
},
|
||||
"id": "3d5c1ae5-e640-4986-9202-78258169bffb"
|
||||
},
|
||||
{
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "process_resident_memory_bytes{job=\"apiserver\"}",
|
||||
"legend": "{{ instance }}"
|
||||
}
|
||||
],
|
||||
"name": "API Server - Memory Usage by instance",
|
||||
"links": [],
|
||||
"description": "apiserver的内存使用量",
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "desc"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "list",
|
||||
"placement": "bottom"
|
||||
},
|
||||
"standardOptions": {
|
||||
"util": "bytesIEC"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": []
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"lineWidth": 1,
|
||||
"fillOpacity": 0.25,
|
||||
"gradientMode": "none",
|
||||
"stack": "off"
|
||||
},
|
||||
"version": "2.0.0",
|
||||
"type": "timeseries",
|
||||
"layout": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 48,
|
||||
"i": "1550a2d5-c808-4174-865a-a41b2c16b486",
|
||||
"isResizable": true
|
||||
},
|
||||
"id": "1550a2d5-c808-4174-865a-a41b2c16b486"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
271
integrations/kubernetes/dashboards/Cadvisor.json
Normal file
@@ -0,0 +1,271 @@
|
||||
{
|
||||
"name": "Cadvisor",
|
||||
"tags": "",
|
||||
"ident": "",
|
||||
"configs": {
|
||||
"version": "2.0.0",
|
||||
"links": [],
|
||||
"var": [
|
||||
{
|
||||
"type": "query",
|
||||
"name": "host",
|
||||
"definition": "label_values({__name__=~\"container.*\"},instance)",
|
||||
"allValue": ".*",
|
||||
"allOption": true,
|
||||
"multi": false,
|
||||
"reg": ""
|
||||
},
|
||||
{
|
||||
"type": "query",
|
||||
"name": "container",
|
||||
"definition": "label_values({__name__=~\"container.*\", instance=~\"$host\"},name)",
|
||||
"allValue": ".*",
|
||||
"allOption": true,
|
||||
"multi": false,
|
||||
"reg": ""
|
||||
}
|
||||
],
|
||||
"panels": [
|
||||
{
|
||||
"version": "2.0.0",
|
||||
"id": "eeb56afe-8a3e-46d6-8923-aeb3d0f124ea",
|
||||
"type": "timeseries",
|
||||
"name": "CPU Usage",
|
||||
"links": [],
|
||||
"layout": {
|
||||
"h": 7,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 0,
|
||||
"i": "eeb56afe-8a3e-46d6-8923-aeb3d0f124ea",
|
||||
"isResizable": true
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "sum(rate(container_cpu_usage_seconds_total{instance=~\"$host\",name=~\"$container\",name=~\".+\"}[5m])) by (name) *100",
|
||||
"legend": "{{name}}"
|
||||
}
|
||||
],
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "#634CD9",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"version": "2.0.0",
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "linear",
|
||||
"fillOpacity": 0.5,
|
||||
"stack": "off"
|
||||
},
|
||||
"maxPerRow": 4
|
||||
},
|
||||
{
|
||||
"version": "2.0.0",
|
||||
"id": "6690fff4-c159-40e5-b340-65a3ba85e37e",
|
||||
"type": "timeseries",
|
||||
"name": "Memory Usage",
|
||||
"links": [],
|
||||
"layout": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 7,
|
||||
"i": "6690fff4-c159-40e5-b340-65a3ba85e37e",
|
||||
"isResizable": true
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "sum(container_memory_rss{instance=~\"$host\",name=~\"$container\",name=~\".+\"}) by (name)",
|
||||
"legend": "{{name}}"
|
||||
}
|
||||
],
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "#634CD9",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"version": "2.0.0",
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "linear",
|
||||
"fillOpacity": 0.5,
|
||||
"stack": "off"
|
||||
},
|
||||
"maxPerRow": 4
|
||||
},
|
||||
{
|
||||
"version": "2.0.0",
|
||||
"id": "3c798af5-cfae-4962-9b70-85736df44bb1",
|
||||
"type": "timeseries",
|
||||
"name": "Memory Cached",
|
||||
"links": [],
|
||||
"layout": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 7,
|
||||
"i": "3c798af5-cfae-4962-9b70-85736df44bb1",
|
||||
"isResizable": true
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "sum(container_memory_cache{instance=~\"$host\",name=~\"$container\",name=~\".+\"}) by (name)",
|
||||
"legend": "{{name}}"
|
||||
}
|
||||
],
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "#634CD9",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"version": "2.0.0",
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "linear",
|
||||
"fillOpacity": 0.5,
|
||||
"stack": "off"
|
||||
},
|
||||
"maxPerRow": 4
|
||||
},
|
||||
{
|
||||
"version": "2.0.0",
|
||||
"id": "111835e1-cfb5-40db-bb52-1aca74cf1a00",
|
||||
"type": "timeseries",
|
||||
"name": "Received Network Traffic",
|
||||
"links": [],
|
||||
"layout": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 15,
|
||||
"i": "111835e1-cfb5-40db-bb52-1aca74cf1a00",
|
||||
"isResizable": true
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "sum(rate(container_network_receive_bytes_total{instance=~\"$host\",name=~\"$container\",name=~\".+\"}[5m])) by (name)",
|
||||
"legend": "{{name}}"
|
||||
}
|
||||
],
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "#634CD9",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"version": "2.0.0",
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "linear",
|
||||
"fillOpacity": 0.5,
|
||||
"stack": "off"
|
||||
},
|
||||
"maxPerRow": 4
|
||||
},
|
||||
{
|
||||
"version": "2.0.0",
|
||||
"id": "b8050f8f-aee7-4fa5-888d-b6025df14aa1",
|
||||
"type": "timeseries",
|
||||
"name": "Sent Network Traffic",
|
||||
"links": [],
|
||||
"layout": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 15,
|
||||
"i": "b8050f8f-aee7-4fa5-888d-b6025df14aa1",
|
||||
"isResizable": true
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "sum(rate(container_network_transmit_bytes_total{instance=~\"$host\",name=~\"$container\",name=~\".+\"}[5m])) by (name)",
|
||||
"legend": "{{name}}"
|
||||
}
|
||||
],
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "#634CD9",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"version": "2.0.0",
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "linear",
|
||||
"fillOpacity": 0.5,
|
||||
"stack": "off"
|
||||
},
|
||||
"maxPerRow": 4
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
1605
integrations/kubernetes/dashboards/Container.json
Normal file
1110
integrations/kubernetes/dashboards/ControllerManager.json
Normal file
1238
integrations/kubernetes/dashboards/KubeStateMetrics.json
Normal file
438
integrations/kubernetes/dashboards/KubeletMetrics.json
Normal file
@@ -0,0 +1,438 @@
|
||||
{
|
||||
"name": "Kubernetes / Kubelet Metrics",
|
||||
"tags": "Categraf",
|
||||
"ident": "",
|
||||
"configs": {
|
||||
"var": [
|
||||
{
|
||||
"name": "cluster",
|
||||
"definition": "label_values(kubelet_running_pods, cluster)",
|
||||
"multi": true,
|
||||
"allOption": true
|
||||
},
|
||||
{
|
||||
"name": "instance",
|
||||
"definition": "label_values(kubelet_running_pods{cluster=~\"$cluster\"}, instance)",
|
||||
"multi": true,
|
||||
"allOption": true
|
||||
}
|
||||
],
|
||||
"panels": [
|
||||
{
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "sum(up{source=\"kubelet\", cluster=~\"$cluster\"})"
|
||||
}
|
||||
],
|
||||
"name": "Kubelet UP",
|
||||
"custom": {
|
||||
"textMode": "value",
|
||||
"colorMode": "background",
|
||||
"calc": "lastNotNull",
|
||||
"colSpan": 1,
|
||||
"textSize": {}
|
||||
},
|
||||
"options": {
|
||||
"standardOptions": {}
|
||||
},
|
||||
"version": "2.0.0",
|
||||
"type": "stat",
|
||||
"layout": {
|
||||
"h": 3,
|
||||
"w": 4,
|
||||
"x": 0,
|
||||
"y": 0,
|
||||
"i": "d3caf396-b3a1-449b-acec-f550967889e6",
|
||||
"isResizable": true
|
||||
},
|
||||
"id": "d3caf396-b3a1-449b-acec-f550967889e6"
|
||||
},
|
||||
{
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "sum(kubelet_running_pods{cluster=~\"$cluster\", instance=~\"$instance\"})"
|
||||
}
|
||||
],
|
||||
"name": "Running Pods",
|
||||
"custom": {
|
||||
"textMode": "valueAndName",
|
||||
"colorMode": "value",
|
||||
"calc": "lastNotNull",
|
||||
"colSpan": 1,
|
||||
"textSize": {}
|
||||
},
|
||||
"options": {
|
||||
"standardOptions": {}
|
||||
},
|
||||
"version": "2.0.0",
|
||||
"type": "stat",
|
||||
"layout": {
|
||||
"h": 3,
|
||||
"w": 4,
|
||||
"x": 4,
|
||||
"y": 0,
|
||||
"i": "38c38b23-a7e3-4177-8c41-3ce955ea0434",
|
||||
"isResizable": true
|
||||
},
|
||||
"id": "38c38b23-a7e3-4177-8c41-3ce955ea0434"
|
||||
},
|
||||
{
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "sum(kubelet_running_containers{cluster=~\"$cluster\", instance=~\"$instance\", container_state=\"running\"})"
|
||||
}
|
||||
],
|
||||
"name": "Running Containers",
|
||||
"custom": {
|
||||
"textMode": "valueAndName",
|
||||
"colorMode": "value",
|
||||
"calc": "lastNotNull",
|
||||
"colSpan": 1,
|
||||
"textSize": {}
|
||||
},
|
||||
"options": {
|
||||
"standardOptions": {}
|
||||
},
|
||||
"version": "2.0.0",
|
||||
"type": "stat",
|
||||
"layout": {
|
||||
"h": 3,
|
||||
"w": 4,
|
||||
"x": 8,
|
||||
"y": 0,
|
||||
"i": "26bf2320-fcff-48f8-a6fc-aa9076bb9329",
|
||||
"isResizable": true
|
||||
},
|
||||
"id": "525859b9-91d7-4180-b363-bf8ceec977d8"
|
||||
},
|
||||
{
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "sum(volume_manager_total_volumes{cluster=~\"$cluster\", instance=~\"$instance\", state=\"desired_state_of_world\"})"
|
||||
}
|
||||
],
|
||||
"name": "Desired Volumes",
|
||||
"custom": {
|
||||
"textMode": "valueAndName",
|
||||
"colorMode": "value",
|
||||
"calc": "lastNotNull",
|
||||
"colSpan": 1,
|
||||
"textSize": {}
|
||||
},
|
||||
"options": {
|
||||
"standardOptions": {}
|
||||
},
|
||||
"version": "2.0.0",
|
||||
"type": "stat",
|
||||
"layout": {
|
||||
"h": 3,
|
||||
"w": 4,
|
||||
"x": 12,
|
||||
"y": 0,
|
||||
"i": "54ae4ab3-e932-418c-a637-f2f515cce1b9",
|
||||
"isResizable": true
|
||||
},
|
||||
"id": "84af4617-2ae0-4b30-a82a-6e8586342224"
|
||||
},
|
||||
{
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "sum(volume_manager_total_volumes{cluster=~\"$cluster\", instance=~\"$instance\", state=\"actual_state_of_world\"})"
|
||||
}
|
||||
],
|
||||
"name": "Actual Volumes",
|
||||
"custom": {
|
||||
"textMode": "valueAndName",
|
||||
"colorMode": "value",
|
||||
"calc": "lastNotNull",
|
||||
"colSpan": 1,
|
||||
"textSize": {}
|
||||
},
|
||||
"options": {
|
||||
"standardOptions": {}
|
||||
},
|
||||
"version": "2.0.0",
|
||||
"type": "stat",
|
||||
"layout": {
|
||||
"h": 3,
|
||||
"w": 4,
|
||||
"x": 16,
|
||||
"y": 0,
|
||||
"i": "d9de76d7-2203-40e7-a792-9888ec869e82",
|
||||
"isResizable": true
|
||||
},
|
||||
"id": "d431f4bd-9115-41d2-a494-1d680bdd1e0f"
|
||||
},
|
||||
{
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "sum(increase(kubelet_runtime_operations_errors_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m]))"
|
||||
}
|
||||
],
|
||||
"name": "OP Errors in 5min",
|
||||
"custom": {
|
||||
"textMode": "value",
|
||||
"colorMode": "value",
|
||||
"calc": "lastNotNull",
|
||||
"colSpan": 1,
|
||||
"textSize": {}
|
||||
},
|
||||
"options": {
|
||||
"valueMappings": [
|
||||
{
|
||||
"type": "range",
|
||||
"match": {
|
||||
"from": 1
|
||||
},
|
||||
"result": {
|
||||
"color": "#d0021b"
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "range",
|
||||
"match": {
|
||||
"to": 1
|
||||
},
|
||||
"result": {
|
||||
"color": "#417505"
|
||||
}
|
||||
}
|
||||
],
|
||||
"standardOptions": {}
|
||||
},
|
||||
"version": "2.0.0",
|
||||
"type": "stat",
|
||||
"layout": {
|
||||
"h": 3,
|
||||
"w": 4,
|
||||
"x": 20,
|
||||
"y": 0,
|
||||
"i": "bf2bbd15-347d-404c-9b8f-e524875befe2",
|
||||
"isResizable": true
|
||||
},
|
||||
"id": "54de62bc-8af3-4c27-8b8e-1af567b363fc"
|
||||
},
|
||||
{
|
||||
"type": "row",
|
||||
"id": "730d4a9b-791f-4aaf-a042-668f66e73814",
|
||||
"name": "Operations",
|
||||
"collapsed": true,
|
||||
"layout": {
|
||||
"h": 1,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 3,
|
||||
"i": "730d4a9b-791f-4aaf-a042-668f66e73814",
|
||||
"isResizable": false
|
||||
},
|
||||
"panels": []
|
||||
},
|
||||
{
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "increase(kubelet_runtime_operations_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])"
|
||||
}
|
||||
],
|
||||
"name": "Operations in 5min",
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
},
|
||||
"standardOptions": {},
|
||||
"thresholds": {}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"fillOpacity": 0.5,
|
||||
"stack": "off"
|
||||
},
|
||||
"version": "2.0.0",
|
||||
"type": "timeseries",
|
||||
"layout": {
|
||||
"h": 4,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 4,
|
||||
"i": "d26e6818-6704-492a-8cbf-58473dd85716",
|
||||
"isResizable": true
|
||||
},
|
||||
"id": "d26e6818-6704-492a-8cbf-58473dd85716"
|
||||
},
|
||||
{
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "increase(kubelet_runtime_operations_errors_total{cluster=~\"$cluster\", instance=~\"$instance\"}[5m])"
|
||||
}
|
||||
],
|
||||
"name": "Operation Errors in 5min",
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
},
|
||||
"standardOptions": {},
|
||||
"thresholds": {}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"fillOpacity": 0.5,
|
||||
"stack": "off"
|
||||
},
|
||||
"version": "2.0.0",
|
||||
"type": "timeseries",
|
||||
"layout": {
|
||||
"h": 4,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 4,
|
||||
"i": "4e585d2f-c61c-4350-86ec-dca7ddc34ceb",
|
||||
"isResizable": true
|
||||
},
|
||||
"id": "09a6ad5b-8c0e-4f17-b17f-3ebc514f7d20"
|
||||
},
|
||||
{
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "increase(kubelet_runtime_operations_duration_seconds_sum{cluster=~\"$cluster\", instance=~\"$instance\"}[1h])/increase(kubelet_runtime_operations_duration_seconds_count{cluster=~\"$cluster\", instance=~\"$instance\"}[1h])"
|
||||
}
|
||||
],
|
||||
"name": "Average Operation duration in 1 hour (Unit: Second)",
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
},
|
||||
"standardOptions": {},
|
||||
"thresholds": {}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"fillOpacity": 0.5,
|
||||
"stack": "off"
|
||||
},
|
||||
"version": "2.0.0",
|
||||
"type": "timeseries",
|
||||
"layout": {
|
||||
"h": 4,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 8,
|
||||
"i": "b5e56f3e-fa20-4c19-8578-c0610fa0a7e7",
|
||||
"isResizable": true
|
||||
},
|
||||
"id": "b5e56f3e-fa20-4c19-8578-c0610fa0a7e7"
|
||||
},
|
||||
{
|
||||
"type": "row",
|
||||
"id": "dd7e84c5-03ce-467c-871a-aa110fe051f4",
|
||||
"name": "PLEG relist",
|
||||
"collapsed": true,
|
||||
"layout": {
|
||||
"h": 1,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 12,
|
||||
"i": "dd7e84c5-03ce-467c-871a-aa110fe051f4",
|
||||
"isResizable": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "rate(kubelet_pleg_relist_duration_seconds_count{cluster=~\"$cluster\", instance=~\"$instance\"}[1h])"
|
||||
}
|
||||
],
|
||||
"name": "relist rate",
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
},
|
||||
"standardOptions": {},
|
||||
"thresholds": {}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"fillOpacity": 0.5,
|
||||
"stack": "off"
|
||||
},
|
||||
"version": "2.0.0",
|
||||
"type": "timeseries",
|
||||
"layout": {
|
||||
"h": 4,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 13,
|
||||
"i": "f3822da8-a9c9-4db1-ba12-465d3ece823e",
|
||||
"isResizable": true
|
||||
},
|
||||
"id": "f3822da8-a9c9-4db1-ba12-465d3ece823e"
|
||||
},
|
||||
{
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "increase(kubelet_pleg_relist_duration_seconds_sum{cluster=~\"$cluster\", instance=~\"$instance\"}[1h])/increase(kubelet_pleg_relist_duration_seconds_count{cluster=~\"$cluster\", instance=~\"$instance\"}[1h])"
|
||||
}
|
||||
],
|
||||
"name": "relist duration (Unit: Second)",
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
},
|
||||
"standardOptions": {},
|
||||
"thresholds": {}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"fillOpacity": 0.5,
|
||||
"stack": "off"
|
||||
},
|
||||
"version": "2.0.0",
|
||||
"type": "timeseries",
|
||||
"layout": {
|
||||
"h": 4,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 13,
|
||||
"i": "2b4ada76-6c30-42cd-9bd3-c939b4c0139c",
|
||||
"isResizable": true
|
||||
},
|
||||
"id": "a6e4c914-bfca-4419-a264-f5b1cbab261a"
|
||||
}
|
||||
],
|
||||
"version": "2.0.0"
|
||||
}
|
||||
}
|
||||
1005
integrations/kubernetes/dashboards/Scheduler.json
Normal file
BIN
integrations/kubernetes/icon/kubernetes.png
Normal file
|
After Width: | Height: | Size: 20 KiB |
206
integrations/linux/dashboards/host_table_view_demo.json
Normal file
@@ -0,0 +1,206 @@
|
||||
{
|
||||
"name": "机器台账表格视图配置样例",
|
||||
"tags": "",
|
||||
"ident": "",
|
||||
"configs": {
|
||||
"links": [
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "n9e",
|
||||
"url": "https://n9e.github.io/"
|
||||
},
|
||||
{
|
||||
"targetBlank": true,
|
||||
"title": "author",
|
||||
"url": "http://flashcat.cloud/"
|
||||
}
|
||||
],
|
||||
"panels": [
|
||||
{
|
||||
"type": "table",
|
||||
"id": "77bf513a-8504-4d33-9efe-75aaf9abc9e4",
|
||||
"layout": {
|
||||
"h": 13,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 0,
|
||||
"i": "77bf513a-8504-4d33-9efe-75aaf9abc9e4",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "3.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${prom}",
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "avg(cpu_usage_active{cpu=\"cpu-total\", ident=~\"$ident\"}) by (ident)",
|
||||
"legend": "CPU使用率"
|
||||
},
|
||||
{
|
||||
"expr": "avg(mem_used_percent{ident=~\"$ident\"}) by (ident)",
|
||||
"refId": "B",
|
||||
"legend": "内存使用率"
|
||||
},
|
||||
{
|
||||
"expr": "avg(mem_total{ident=~\"$ident\"}) by (ident)",
|
||||
"refId": "C",
|
||||
"legend": "总内存"
|
||||
},
|
||||
{
|
||||
"expr": "avg(mem_free{ident=~\"$ident\"}) by (ident)",
|
||||
"refId": "D",
|
||||
"legend": "剩余内存"
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {
|
||||
"renameByName": {
|
||||
"ident": "机器"
|
||||
}
|
||||
}
|
||||
}
|
||||
],
|
||||
"name": "表格配置样例",
|
||||
"custom": {
|
||||
"showHeader": true,
|
||||
"colorMode": "background",
|
||||
"calc": "lastNotNull",
|
||||
"displayMode": "labelValuesToRows",
|
||||
"aggrDimension": "ident",
|
||||
"sortColumn": "ident",
|
||||
"sortOrder": "ascend"
|
||||
},
|
||||
"options": {
|
||||
"standardOptions": {}
|
||||
},
|
||||
"overrides": [
|
||||
{
|
||||
"matcher": {
|
||||
"value": "A"
|
||||
},
|
||||
"properties": {
|
||||
"valueMappings": [
|
||||
{
|
||||
"type": "range",
|
||||
"result": {
|
||||
"color": "#2c9d3d"
|
||||
},
|
||||
"match": {
|
||||
"to": 65
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "range",
|
||||
"result": {
|
||||
"color": "#ff656b"
|
||||
},
|
||||
"match": {
|
||||
"to": 90
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "range",
|
||||
"result": {
|
||||
"color": "#f50505"
|
||||
},
|
||||
"match": {
|
||||
"from": 90
|
||||
}
|
||||
}
|
||||
],
|
||||
"standardOptions": {
|
||||
"util": "percent"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "special",
|
||||
"matcher": {
|
||||
"value": "B"
|
||||
},
|
||||
"properties": {
|
||||
"valueMappings": [
|
||||
{
|
||||
"type": "range",
|
||||
"result": {
|
||||
"color": "#2c9d3d"
|
||||
},
|
||||
"match": {
|
||||
"to": 65
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "range",
|
||||
"result": {
|
||||
"color": "#ff656b"
|
||||
},
|
||||
"match": {
|
||||
"to": 90
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "range",
|
||||
"result": {
|
||||
"color": "#fa0a0a"
|
||||
},
|
||||
"match": {
|
||||
"from": 90
|
||||
}
|
||||
}
|
||||
],
|
||||
"standardOptions": {
|
||||
"util": "percent"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "special",
|
||||
"matcher": {
|
||||
"value": "C"
|
||||
},
|
||||
"properties": {
|
||||
"valueMappings": [],
|
||||
"standardOptions": {
|
||||
"util": "bytesIEC",
|
||||
"decimals": 2
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "special",
|
||||
"matcher": {
|
||||
"value": "D"
|
||||
},
|
||||
"properties": {
|
||||
"standardOptions": {
|
||||
"util": "bytesIEC",
|
||||
"decimals": 2
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"var": [
|
||||
{
|
||||
"definition": "prometheus",
|
||||
"name": "prom",
|
||||
"type": "datasource"
|
||||
},
|
||||
{
|
||||
"allOption": true,
|
||||
"datasource": {
|
||||
"cate": "prometheus",
|
||||
"value": "${prom}"
|
||||
},
|
||||
"definition": "label_values(system_load1,ident)",
|
||||
"multi": true,
|
||||
"name": "ident",
|
||||
"type": "query"
|
||||
}
|
||||
],
|
||||
"version": "3.0.0"
|
||||
}
|
||||
}
|
||||
@@ -1437,7 +1437,7 @@
|
||||
"displayMode": "hidden"
|
||||
},
|
||||
"standardOptions": {
|
||||
"util": "bytesIEC",
|
||||
"util": "bitsIEC",
|
||||
"decimals": 0
|
||||
},
|
||||
"thresholds": {
|
||||
@@ -1732,4 +1732,4 @@
|
||||
],
|
||||
"version": "3.0.0"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
BIN
integrations/n9e/icon/n9e-circle.png
Normal file
|
After Width: | Height: | Size: 21 KiB |
|
Before Width: | Height: | Size: 14 KiB |
|
Before Width: | Height: | Size: 23 KiB |
@@ -1,41 +1,57 @@
|
||||
{
|
||||
"name": "PING detection",
|
||||
"name": "PING detection by UlricQin",
|
||||
"tags": "",
|
||||
"ident": "",
|
||||
"configs": {
|
||||
"panels": [
|
||||
{
|
||||
"collapsed": true,
|
||||
"id": "eb08a300-c59a-4b0d-8537-62512e833f48",
|
||||
"layout": {
|
||||
"h": 1,
|
||||
"i": "eb08a300-c59a-4b0d-8537-62512e833f48",
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 0
|
||||
},
|
||||
"name": "Default chart group",
|
||||
"type": "row"
|
||||
},
|
||||
{
|
||||
"custom": {
|
||||
"aggrDimension": "target",
|
||||
"calc": "lastNotNull",
|
||||
"displayMode": "labelValuesToRows",
|
||||
"showHeader": true
|
||||
},
|
||||
"type": "table",
|
||||
"id": "1677138f-0f33-485c-8ee1-2db24cabbf54",
|
||||
"layout": {
|
||||
"h": 15,
|
||||
"i": "1677138f-0f33-485c-8ee1-2db24cabbf54",
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 1
|
||||
"y": 0,
|
||||
"i": "1677138f-0f33-485c-8ee1-2db24cabbf54",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "3.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${prom}",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "max(ping_result_code) by (target)",
|
||||
"legend": "UP?",
|
||||
"refId": "A"
|
||||
},
|
||||
{
|
||||
"expr": "max(ping_percent_packet_loss) by (target)",
|
||||
"legend": "Packet Loss %",
|
||||
"refId": "B"
|
||||
},
|
||||
{
|
||||
"expr": "max(ping_maximum_response_ms) by (target) ",
|
||||
"legend": "Latency(ms)",
|
||||
"refId": "C"
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "Ping",
|
||||
"custom": {
|
||||
"showHeader": true,
|
||||
"colorMode": "background",
|
||||
"calc": "lastNotNull",
|
||||
"displayMode": "labelValuesToRows",
|
||||
"aggrDimension": "target"
|
||||
},
|
||||
"options": {
|
||||
"standardOptions": {},
|
||||
"valueMappings": []
|
||||
"valueMappings": [],
|
||||
"standardOptions": {}
|
||||
},
|
||||
"overrides": [
|
||||
{
|
||||
@@ -50,7 +66,7 @@
|
||||
"special": 0
|
||||
},
|
||||
"result": {
|
||||
"color": "#417505",
|
||||
"color": "#2c9d3d",
|
||||
"text": "UP"
|
||||
},
|
||||
"type": "special"
|
||||
@@ -68,38 +84,88 @@
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
],
|
||||
"targets": [
|
||||
{
|
||||
"expr": "max(ping_result_code) by (target)",
|
||||
"legend": "UP?",
|
||||
"refId": "A"
|
||||
},
|
||||
{
|
||||
"expr": "max(ping_percent_packet_loss) by (target)",
|
||||
"legend": "Packet Loss %",
|
||||
"refId": "B"
|
||||
"type": "special",
|
||||
"matcher": {
|
||||
"value": "B"
|
||||
},
|
||||
"properties": {
|
||||
"valueMappings": [
|
||||
{
|
||||
"type": "range",
|
||||
"result": {
|
||||
"color": "#f30a0a"
|
||||
},
|
||||
"match": {
|
||||
"from": 1
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "special",
|
||||
"result": {
|
||||
"color": "#2c9d3d"
|
||||
},
|
||||
"match": {
|
||||
"special": 0
|
||||
}
|
||||
}
|
||||
],
|
||||
"standardOptions": {}
|
||||
}
|
||||
},
|
||||
{
|
||||
"expr": "max(httpresponse_response_time) by (target)",
|
||||
"legend": "latency(s)",
|
||||
"refId": "C"
|
||||
"type": "special",
|
||||
"matcher": {
|
||||
"value": "C"
|
||||
},
|
||||
"properties": {
|
||||
"valueMappings": [
|
||||
{
|
||||
"type": "range",
|
||||
"result": {
|
||||
"color": "#2c9d3d"
|
||||
},
|
||||
"match": {
|
||||
"from": null,
|
||||
"to": 100
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "range",
|
||||
"result": {
|
||||
"color": "#ff8286"
|
||||
},
|
||||
"match": {
|
||||
"to": 300
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "range",
|
||||
"result": {
|
||||
"color": "#f00808"
|
||||
},
|
||||
"match": {
|
||||
"to": null,
|
||||
"from": 1000
|
||||
}
|
||||
}
|
||||
],
|
||||
"standardOptions": {
|
||||
"util": "milliseconds"
|
||||
}
|
||||
}
|
||||
}
|
||||
],
|
||||
"type": "table",
|
||||
"version": "2.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${prom}"
|
||||
]
|
||||
}
|
||||
],
|
||||
"version": "3.0.0",
|
||||
"var": [
|
||||
{
|
||||
"definition": "prometheus",
|
||||
"name": "prom",
|
||||
"type": "datasource",
|
||||
"definition": "prometheus"
|
||||
"type": "datasource"
|
||||
}
|
||||
]
|
||||
],
|
||||
"version": "3.0.0"
|
||||
}
|
||||
}
|
||||
68
integrations/processes/alerts/categraf-processes.json
Normal file
@@ -0,0 +1,68 @@
|
||||
[
|
||||
{
|
||||
"cate": "prometheus",
|
||||
"datasource_ids": [
|
||||
0
|
||||
],
|
||||
"name": "Too many running processes",
|
||||
"note": "",
|
||||
"prod": "metric",
|
||||
"algorithm": "",
|
||||
"algo_params": null,
|
||||
"delay": 0,
|
||||
"severity": 0,
|
||||
"severities": [
|
||||
2
|
||||
],
|
||||
"disabled": 0,
|
||||
"prom_for_duration": 60,
|
||||
"prom_ql": "",
|
||||
"rule_config": {
|
||||
"queries": [
|
||||
{
|
||||
"prom_ql": "processes_running > (system_n_cpus * 3)",
|
||||
"severity": 2
|
||||
}
|
||||
]
|
||||
},
|
||||
"prom_eval_interval": 30,
|
||||
"enable_stime": "00:00",
|
||||
"enable_stimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_etime": "23:59",
|
||||
"enable_etimes": [
|
||||
"23:59"
|
||||
],
|
||||
"enable_days_of_week": [
|
||||
"0",
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6"
|
||||
],
|
||||
"enable_days_of_weeks": [
|
||||
[
|
||||
"0",
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6"
|
||||
]
|
||||
],
|
||||
"enable_in_bg": 1,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_repeat_step": 60,
|
||||
"notify_max_number": 10,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": [],
|
||||
"annotations": {}
|
||||
}
|
||||
]
|
||||
234
integrations/processes/dashboards/categraf-processes.json
Normal file
@@ -0,0 +1,234 @@
|
||||
{
|
||||
"name": "Processes by UlricQin",
|
||||
"tags": "Categraf Linux OS",
|
||||
"ident": "",
|
||||
"configs": {
|
||||
"var": [
|
||||
{
|
||||
"name": "Datasource",
|
||||
"label": "",
|
||||
"type": "datasource",
|
||||
"definition": "prometheus",
|
||||
"defaultValue": 37
|
||||
},
|
||||
{
|
||||
"name": "ident",
|
||||
"label": "Host",
|
||||
"type": "query",
|
||||
"datasource": {
|
||||
"cate": "prometheus",
|
||||
"value": "${Datasource}"
|
||||
},
|
||||
"definition": "label_values(processes_running, ident)",
|
||||
"multi": true,
|
||||
"allOption": true
|
||||
}
|
||||
],
|
||||
"panels": [
|
||||
{
|
||||
"type": "barGauge",
|
||||
"id": "adc3f1d3-6d0d-4c1e-80ca-5b6d8103bac5",
|
||||
"layout": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 0,
|
||||
"i": "adc3f1d3-6d0d-4c1e-80ca-5b6d8103bac5",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "3.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${Datasource}",
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "processes_running{ident=~\"$ident\"}",
|
||||
"legend": "{{ident}}",
|
||||
"instant": true
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "Running Processes",
|
||||
"custom": {
|
||||
"calc": "lastNotNull",
|
||||
"baseColor": "#9470FF",
|
||||
"serieWidth": 20,
|
||||
"sortOrder": "desc"
|
||||
},
|
||||
"options": {
|
||||
"valueMappings": [
|
||||
{
|
||||
"type": "range",
|
||||
"result": {
|
||||
"color": "#f10808"
|
||||
},
|
||||
"match": {
|
||||
"from": 50
|
||||
}
|
||||
}
|
||||
],
|
||||
"standardOptions": {}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "barGauge",
|
||||
"id": "659f5f75-24ca-493c-97cb-3d99abd52172",
|
||||
"layout": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 0,
|
||||
"i": "df457bf0-17c8-4d05-a527-cfaf0f2b844c",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "3.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${Datasource}",
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "processes_total{ident=~\"$ident\"}",
|
||||
"legend": "{{ident}}",
|
||||
"instant": true
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "Total Processes",
|
||||
"custom": {
|
||||
"calc": "lastNotNull",
|
||||
"baseColor": "#9470FF",
|
||||
"serieWidth": 20,
|
||||
"sortOrder": "desc"
|
||||
},
|
||||
"options": {
|
||||
"valueMappings": [
|
||||
{
|
||||
"type": "range",
|
||||
"result": {
|
||||
"color": "#f10808"
|
||||
},
|
||||
"match": {
|
||||
"from": 600
|
||||
}
|
||||
}
|
||||
],
|
||||
"standardOptions": {}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "barGauge",
|
||||
"id": "5e849509-1c41-44c7-85ee-d8c0adf7c623",
|
||||
"layout": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 8,
|
||||
"i": "62291285-be84-470a-9ccc-53be7a8733fd",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "3.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${Datasource}",
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "processes_total_threads{ident=~\"$ident\"}",
|
||||
"legend": "{{ident}}",
|
||||
"instant": true
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "Total Threads",
|
||||
"custom": {
|
||||
"calc": "lastNotNull",
|
||||
"baseColor": "#9470FF",
|
||||
"serieWidth": 20,
|
||||
"sortOrder": "desc"
|
||||
},
|
||||
"options": {
|
||||
"valueMappings": [
|
||||
{
|
||||
"type": "range",
|
||||
"result": {
|
||||
"color": "#ff8286"
|
||||
},
|
||||
"match": {
|
||||
"from": 2000
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "range",
|
||||
"result": {
|
||||
"color": "#f30909"
|
||||
},
|
||||
"match": {
|
||||
"from": 4000
|
||||
}
|
||||
}
|
||||
],
|
||||
"standardOptions": {}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "table",
|
||||
"id": "b2850506-6cdd-48cc-9223-70acff9212b0",
|
||||
"layout": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 8,
|
||||
"i": "b2850506-6cdd-48cc-9223-70acff9212b0",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "3.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${Datasource}",
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "sum({__name__=~\"processes_sleeping|processes_dead|processes_paging|processes_total_threads|processes_total|processes_idle|processes_running|processes_zombies|processes_stopped|processes_unknown|processes_blocked\", ident=~\"$ident\"}) by (__name__)",
|
||||
"instant": true
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "SUM by Process state",
|
||||
"custom": {
|
||||
"showHeader": true,
|
||||
"colorMode": "value",
|
||||
"calc": "lastNotNull",
|
||||
"displayMode": "labelsOfSeriesToRows",
|
||||
"sortColumn": "value",
|
||||
"sortOrder": "descend",
|
||||
"columns": []
|
||||
},
|
||||
"options": {
|
||||
"standardOptions": {}
|
||||
},
|
||||
"overrides": [
|
||||
{}
|
||||
]
|
||||
}
|
||||
],
|
||||
"version": "3.0.0"
|
||||
}
|
||||
}
|
||||
BIN
integrations/processes/icon/linux.png
Normal file
|
After Width: | Height: | Size: 7.7 KiB |
23
integrations/processes/markdown/readme.md
Normal file
@@ -0,0 +1,23 @@
|
||||
## Categraf as collector
|
||||
|
||||
configuration file: `conf/input.processes/processes.toml`
|
||||
|
||||
默认配置如下(一般维持默认不用动):
|
||||
|
||||
```toml
|
||||
# # collect interval
|
||||
# interval = 15
|
||||
|
||||
# # force use ps command to gather
|
||||
# force_ps = false
|
||||
|
||||
# # force use /proc to gather
|
||||
# force_proc = false
|
||||
```
|
||||
|
||||
有两种采集方式,使用 ps 命令,或者直接读取 `/proc` 目录,默认是后者。如果想强制使用 ps 命令才采集,开启 force_ps 即可:
|
||||
|
||||
```
|
||||
force_ps = true
|
||||
```
|
||||
|
||||
62
integrations/procstat/alerts/categraf-procstat.json
Normal file
@@ -0,0 +1,62 @@
|
||||
[
|
||||
{
|
||||
"name": "there is a process count of 0, indicating that a certain process may have crashed",
|
||||
"note": "",
|
||||
"severity": 1,
|
||||
"disabled": 0,
|
||||
"prom_for_duration": 60,
|
||||
"prom_ql": "procstat_lookup_count == 0",
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_etime": "23:59",
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [
|
||||
"email",
|
||||
"dingtalk",
|
||||
"wecom"
|
||||
],
|
||||
"notify_repeat_step": 60,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": []
|
||||
},
|
||||
{
|
||||
"name": "process handle limit is too low",
|
||||
"note": "",
|
||||
"severity": 3,
|
||||
"disabled": 0,
|
||||
"prom_for_duration": 60,
|
||||
"prom_ql": "procstat_rlimit_num_fds_soft < 2048",
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_etime": "23:59",
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [
|
||||
"email",
|
||||
"dingtalk",
|
||||
"wecom"
|
||||
],
|
||||
"notify_repeat_step": 60,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": []
|
||||
}
|
||||
]
|
||||
642
integrations/procstat/dashboards/categraf-procstat.json
Normal file
@@ -0,0 +1,642 @@
|
||||
{
|
||||
"name": "Procstat by UlricQin",
|
||||
"tags": "Categraf OS",
|
||||
"ident": "",
|
||||
"configs": {
|
||||
"var": [
|
||||
{
|
||||
"name": "Datasource",
|
||||
"type": "datasource",
|
||||
"definition": "prometheus",
|
||||
"defaultValue": 37
|
||||
},
|
||||
{
|
||||
"name": "ident",
|
||||
"label": "Host",
|
||||
"type": "query",
|
||||
"datasource": {
|
||||
"cate": "prometheus",
|
||||
"value": "${Datasource}"
|
||||
},
|
||||
"definition": "label_values(system_load_norm_1, ident)",
|
||||
"multi": true,
|
||||
"allOption": true
|
||||
},
|
||||
{
|
||||
"name": "search_string",
|
||||
"label": "Proc",
|
||||
"type": "query",
|
||||
"datasource": {
|
||||
"cate": "prometheus",
|
||||
"value": "${Datasource}"
|
||||
},
|
||||
"definition": "label_values(procstat_lookup_count{ident=~\"$ident\"}, search_string)",
|
||||
"multi": true,
|
||||
"allOption": true
|
||||
}
|
||||
],
|
||||
"panels": [
|
||||
{
|
||||
"type": "stat",
|
||||
"id": "be9aac6c-4401-4c61-8c43-574cf314ffef",
|
||||
"layout": {
|
||||
"h": 4,
|
||||
"w": 5,
|
||||
"x": 0,
|
||||
"y": 0,
|
||||
"i": "be9aac6c-4401-4c61-8c43-574cf314ffef",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "3.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${Datasource}",
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "procstat_lookup_count{ident=~\"$ident\", search_string=~\"$search_string\"}",
|
||||
"legend": "{{ident}} {{search_string}}",
|
||||
"instant": true
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "Proc Count Now",
|
||||
"custom": {
|
||||
"textMode": "valueAndName",
|
||||
"colorMode": "background",
|
||||
"calc": "lastNotNull",
|
||||
"valueField": "Value",
|
||||
"colSpan": 1,
|
||||
"textSize": {}
|
||||
},
|
||||
"options": {
|
||||
"standardOptions": {},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "#634CD9",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "table",
|
||||
"id": "da621e2c-ae2b-4375-9a66-2bec7832490b",
|
||||
"layout": {
|
||||
"h": 4,
|
||||
"w": 9,
|
||||
"x": 5,
|
||||
"y": 0,
|
||||
"i": "79db82d9-5f46-4c45-bb9f-c23f94d99e0a",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "3.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${Datasource}",
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "procstat_lookup_count{ident=~\"$ident\", search_string=~\"$search_string\"}",
|
||||
"legend": "{{ident}} {{search_string}}",
|
||||
"instant": true
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "Proc Count Table",
|
||||
"custom": {
|
||||
"showHeader": true,
|
||||
"colorMode": "background",
|
||||
"calc": "lastNotNull",
|
||||
"displayMode": "labelsOfSeriesToRows",
|
||||
"columns": [
|
||||
"ident",
|
||||
"search_string",
|
||||
"value"
|
||||
],
|
||||
"sortColumn": "ident",
|
||||
"sortOrder": "ascend"
|
||||
},
|
||||
"options": {
|
||||
"valueMappings": [
|
||||
{
|
||||
"type": "special",
|
||||
"result": {
|
||||
"color": "#fa0c0c"
|
||||
},
|
||||
"match": {
|
||||
"special": 0
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "range",
|
||||
"result": {
|
||||
"color": "#2c9d3d"
|
||||
},
|
||||
"match": {
|
||||
"from": 1
|
||||
}
|
||||
}
|
||||
],
|
||||
"standardOptions": {}
|
||||
},
|
||||
"overrides": [
|
||||
{}
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"id": "881c04fd-8804-432e-9b34-b4761590de20",
|
||||
"layout": {
|
||||
"h": 4,
|
||||
"w": 10,
|
||||
"x": 14,
|
||||
"y": 0,
|
||||
"i": "24b55362-d900-43c0-98d5-f2e994bf22a6",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "3.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${Datasource}",
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "procstat_lookup_count{ident=~\"$ident\", search_string=~\"$search_string\"}",
|
||||
"legend": "{{ident}} {{search_string}}",
|
||||
"instant": false
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "Proc Count Trend",
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
},
|
||||
"standardOptions": {},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "#634CD9",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"spanNulls": false,
|
||||
"lineWidth": 1,
|
||||
"fillOpacity": 0.5,
|
||||
"gradientMode": "none",
|
||||
"stack": "off",
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "stat",
|
||||
"id": "ffeb0fc6-ee02-4fdd-a8e3-ec2b9db9c23c",
|
||||
"layout": {
|
||||
"h": 4,
|
||||
"w": 5,
|
||||
"x": 0,
|
||||
"y": 4,
|
||||
"i": "acd6e7b5-99f5-4d9b-9124-8072c14e5fea",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "3.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${Datasource}",
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "procstat_uptime_minimum{ident=~\"$ident\", search_string=~\"$search_string\"}",
|
||||
"legend": "{{ident}} {{search_string}}",
|
||||
"instant": true
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "Proc Uptime",
|
||||
"custom": {
|
||||
"textMode": "valueAndName",
|
||||
"colorMode": "value",
|
||||
"calc": "lastNotNull",
|
||||
"valueField": "Value",
|
||||
"colSpan": 1,
|
||||
"textSize": {}
|
||||
},
|
||||
"options": {
|
||||
"standardOptions": {
|
||||
"util": "humantimeSeconds"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "#2c9d3d",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"id": "c642a30a-da86-402c-87bf-c2f98616bf95",
|
||||
"layout": {
|
||||
"h": 4,
|
||||
"w": 9,
|
||||
"x": 5,
|
||||
"y": 4,
|
||||
"i": "c642a30a-da86-402c-87bf-c2f98616bf95",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "3.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${Datasource}",
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "procstat_cpu_usage_total{ident=~\"$ident\", search_string=~\"$search_string\"}"
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "CPU Util",
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
},
|
||||
"standardOptions": {
|
||||
"util": "percent"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "#634CD9",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"spanNulls": false,
|
||||
"lineWidth": 1,
|
||||
"fillOpacity": 0.5,
|
||||
"gradientMode": "none",
|
||||
"stack": "off",
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"id": "cbc2444e-49c7-45e1-b64e-cd1282b5a419",
|
||||
"layout": {
|
||||
"h": 4,
|
||||
"w": 10,
|
||||
"x": 14,
|
||||
"y": 4,
|
||||
"i": "198846a2-4794-4ba9-9c2d-137bce22f266",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "3.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${Datasource}",
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "procstat_mem_usage_total{ident=~\"$ident\", search_string=~\"$search_string\"}"
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "Mem Util",
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
},
|
||||
"standardOptions": {
|
||||
"util": "percent"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "#634CD9",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"spanNulls": false,
|
||||
"lineWidth": 1,
|
||||
"fillOpacity": 0.5,
|
||||
"gradientMode": "none",
|
||||
"stack": "off",
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"id": "d2bff162-5801-4d85-94d7-d63145d5b935",
|
||||
"layout": {
|
||||
"h": 4,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 8,
|
||||
"i": "a208e192-cf74-468b-9bcb-cb81c8d78d24",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "3.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${Datasource}",
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "procstat_num_fds_total{ident=~\"$ident\", search_string=~\"$search_string\"}/procstat_rlimit_num_fds_soft_minimum{ident=~\"$ident\", search_string=~\"$search_string\"}*100"
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "FD soft Util",
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
},
|
||||
"standardOptions": {
|
||||
"util": "percent"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "#634CD9",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"spanNulls": false,
|
||||
"lineWidth": 1,
|
||||
"fillOpacity": 0.5,
|
||||
"gradientMode": "none",
|
||||
"stack": "off",
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"id": "46a71143-84b5-4dde-87db-2f0403df6519",
|
||||
"layout": {
|
||||
"h": 4,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 8,
|
||||
"i": "22dfb5e4-1d17-4e06-a9b4-b25cb33d1c20",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "3.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${Datasource}",
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "procstat_num_fds_total{ident=~\"$ident\", search_string=~\"$search_string\"}/procstat_rlimit_num_fds_hard_minimum{ident=~\"$ident\", search_string=~\"$search_string\"}*100"
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "FD hard Util",
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
},
|
||||
"standardOptions": {
|
||||
"util": "percent"
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "#634CD9",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"spanNulls": false,
|
||||
"lineWidth": 1,
|
||||
"fillOpacity": 0.5,
|
||||
"gradientMode": "none",
|
||||
"stack": "off",
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"id": "3dda4eb5-a27f-4d54-9547-ae8f0ac9bb96",
|
||||
"layout": {
|
||||
"h": 4,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 12,
|
||||
"i": "3dda4eb5-a27f-4d54-9547-ae8f0ac9bb96",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "3.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${Datasource}",
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "procstat_read_bytes_total{ident=~\"$ident\", search_string=~\"$search_string\"}"
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "Read bytes",
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
},
|
||||
"standardOptions": {
|
||||
"util": "bytesIEC",
|
||||
"decimals": 1
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "#634CD9",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"spanNulls": false,
|
||||
"lineWidth": 1,
|
||||
"fillOpacity": 0.5,
|
||||
"gradientMode": "none",
|
||||
"stack": "off",
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"id": "c97403f4-618d-4037-8ea7-5deb32eb8d56",
|
||||
"layout": {
|
||||
"h": 4,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 12,
|
||||
"i": "ae0dc449-8263-4f38-8c52-d50b3cb3f1b4",
|
||||
"isResizable": true
|
||||
},
|
||||
"version": "3.0.0",
|
||||
"datasourceCate": "prometheus",
|
||||
"datasourceValue": "${Datasource}",
|
||||
"targets": [
|
||||
{
|
||||
"refId": "A",
|
||||
"expr": "procstat_read_bytes_total{ident=~\"$ident\", search_string=~\"$search_string\"}"
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {}
|
||||
}
|
||||
],
|
||||
"name": "Write bytes",
|
||||
"options": {
|
||||
"tooltip": {
|
||||
"mode": "all",
|
||||
"sort": "none"
|
||||
},
|
||||
"legend": {
|
||||
"displayMode": "hidden"
|
||||
},
|
||||
"standardOptions": {
|
||||
"util": "bytesIEC",
|
||||
"decimals": 1
|
||||
},
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "#634CD9",
|
||||
"value": null,
|
||||
"type": "base"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"custom": {
|
||||
"drawStyle": "lines",
|
||||
"lineInterpolation": "smooth",
|
||||
"spanNulls": false,
|
||||
"lineWidth": 1,
|
||||
"fillOpacity": 0.5,
|
||||
"gradientMode": "none",
|
||||
"stack": "off",
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
}
|
||||
}
|
||||
}
|
||||
],
|
||||
"version": "3.0.0"
|
||||
}
|
||||
}
|
||||
BIN
integrations/procstat/icon/processwire.png
Normal file
|
After Width: | Height: | Size: 848 B |
80
integrations/procstat/markdown/readme.md
Normal file
@@ -0,0 +1,80 @@
|
||||
## Categraf as collector
|
||||
|
||||
configuration file: `conf/input.procstat/procstat.toml`
|
||||
|
||||
进程监控插件,两个核心作用,监控进程是否存活、监控进程使用了多少资源(CPU、内存、文件句柄等)
|
||||
|
||||
### 存活监控
|
||||
|
||||
如果进程监听了端口,就直接用 net_response 来做存活性监控即可,无需使用 procstat 来做,因为:端口在监听,说明进程一定活着,反之则不一定。
|
||||
|
||||
### 进程筛选
|
||||
|
||||
机器上进程很多,我们要做进程监控,就要想办法告诉 Categraf 要监控哪些进程,通过 search 打头的那几个配置,可以做进程过滤筛选:
|
||||
|
||||
```toml
|
||||
[[instnaces]]
|
||||
# # executable name (ie, pgrep <search_exec_substring>)
|
||||
search_exec_substring = "nginx"
|
||||
|
||||
# # pattern as argument for pgrep (ie, pgrep -f <search_cmdline_substring>)
|
||||
# search_cmdline_substring = "n9e server"
|
||||
|
||||
# # windows service name
|
||||
# search_win_service = ""
|
||||
```
|
||||
|
||||
上面三个 search 相关的配置,每个采集目标选用其中一个。有一个额外的配置:search_user,配合search_exec_substring 或者 search_cmdline_substring 使用,表示匹配指定 username 的特定进程。如果不需要指定username,保持配置注释即可。
|
||||
|
||||
```toml
|
||||
# # search process with specific user, option with exec_substring or cmdline_substring
|
||||
# search_user = ""
|
||||
```
|
||||
|
||||
默认的进程监控的配置,`[[instnaces]]` 是注释掉的,记得打开。
|
||||
|
||||
### mode
|
||||
|
||||
mode 配置有两个值供选择,一个是 solaris,一个是 irix,默认是 irix,用这个配置来决定使用哪种 cpu 使用率的计算方法:
|
||||
|
||||
```go
|
||||
func (ins *Instance) gatherCPU(slist *types.SampleList, procs map[PID]Process, tags map[string]string, solarisMode bool) {
|
||||
var value float64
|
||||
for pid := range procs {
|
||||
v, err := procs[pid].Percent(time.Duration(0))
|
||||
if err == nil {
|
||||
if solarisMode {
|
||||
value += v / float64(runtime.NumCPU())
|
||||
slist.PushFront(types.NewSample("cpu_usage", v/float64(runtime.NumCPU()), map[string]string{"pid": fmt.Sprint(pid)}, tags))
|
||||
} else {
|
||||
value += v
|
||||
slist.PushFront(types.NewSample("cpu_usage", v, map[string]string{"pid": fmt.Sprint(pid)}, tags))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if ins.GatherTotal {
|
||||
slist.PushFront(types.NewSample("cpu_usage_total", value, tags))
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### gather_total
|
||||
|
||||
比如进程名字是 mysql 的进程,同时可能运行了多个,我们想知道这个机器上的所有 mysql 的进程占用的总的 cpu、mem、fd 等,就设置 gather_total = true,当然,对于 uptime 和 limit 的采集,gather_total 的时候是取的多个进程的最小值
|
||||
|
||||
### gather_per_pid
|
||||
|
||||
还是拿 mysql 举例,一个机器上可能同时运行了多个,我们可能想知道每个 mysql 进程的资源占用情况,此时就要启用 gather_per_pid 的配置,设置为 true,此时会采集每个进程的资源占用情况,并附上 pid 作为标签来区分
|
||||
|
||||
### gather_more_metrics
|
||||
|
||||
默认 procstat 插件只是采集进程数量,如果想采集进程占用的资源,就要启用 gather_more_metrics 中的项,启用哪个就额外采集哪个
|
||||
|
||||
### jvm
|
||||
|
||||
gather_more_metrics 中有个 jvm,如果是 Java 的进程可以选择开启,非 Java 的进程就不要开启了。需要注意的是,这个监控需要依赖机器上的 jstat 命令,这是社区小伙伴贡献的采集代码,感谢 [@lsy1990](https://github.com/lsy1990)
|
||||
|
||||
### One more thing
|
||||
|
||||
要监控什么进程就去目标机器修改 Categraf 的配置 `conf/input.procstat/procstat.toml` ,如果嫌麻烦,可以联系我们采购专业版,专业版支持在服务端 WEB 上统一做配置,不需要登录目标机器修改 Categraf 的配置。
|
||||