mirror of
https://github.com/ccfos/nightingale.git
synced 2026-03-03 22:48:56 +00:00
Compare commits
7 Commits
embedded-i
...
webhook-ba
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
b057940134 | ||
|
|
4500c4aba8 | ||
|
|
726e994d58 | ||
|
|
71c4c24f00 | ||
|
|
d14a834149 | ||
|
|
5f895552a9 | ||
|
|
1e3f62b92f |
17
README.md
17
README.md
@@ -29,15 +29,15 @@
|
||||
|
||||
## 夜莺 Nightingale 是什么
|
||||
|
||||
夜莺监控是一款开源云原生观测分析工具,采用 All-in-One 的设计理念,集数据采集、可视化、监控告警、数据分析于一体,与云原生生态紧密集成,提供开箱即用的企业级监控分析和告警能力。夜莺于 2020 年 3 月 20 日,在 GitHub 上发布 v1 版本,已累计迭代 100 多个版本。
|
||||
夜莺监控是一款开源云原生观测分析工具,采用 All-in-One 的设计理念,集数据采集、可视化、监控告警、数据分析于一体,与云原生生态紧密集成,提供开箱即用的企业级监控分析和告警能力。夜莺于 2020 年 3 月 20 日,在 github 上发布 v1 版本,已累计迭代 100 多个版本。
|
||||
|
||||
夜莺最初由滴滴开发和开源,并于 2022 年 5 月 11 日,捐赠予中国计算机学会开源发展委员会(CCF ODC),为 CCF ODC 成立后接受捐赠的第一个开源项目。夜莺的核心研发团队,也是 Open-Falcon 项目原核心研发人员,从 2014 年(Open-Falcon 是 2014 年开源)算起来,也有 10 年了,只为把监控这个事情做好。
|
||||
|
||||
|
||||
## 快速开始
|
||||
- 👉 [文档中心](https://flashcat.cloud/docs/) | [下载中心](https://flashcat.cloud/download/nightingale/)
|
||||
- ❤️ [报告 Bug](https://github.com/ccfos/nightingale/issues/new?assignees=&labels=&projects=&template=question.yml)
|
||||
- ℹ️ 为了提供更快速的访问体验,上述文档和下载站点托管于 [FlashcatCloud](https://flashcat.cloud)
|
||||
- 👉[文档中心](https://flashcat.cloud/docs/) | [下载中心](https://flashcat.cloud/download/nightingale/)
|
||||
- ❤️[报告 Bug](https://github.com/ccfos/nightingale/issues/new?assignees=&labels=&projects=&template=question.yml)
|
||||
- ℹ️为了提供更快速的访问体验,上述文档和下载站点托管于 [FlashcatCloud](https://flashcat.cloud)
|
||||
|
||||
## 功能特点
|
||||
|
||||
@@ -50,11 +50,6 @@
|
||||
|
||||
## 截图演示
|
||||
|
||||
|
||||
你可以在页面的右上角,切换语言和主题,目前我们支持英语、简体中文、繁体中文。
|
||||
|
||||

|
||||
|
||||
即时查询,类似 Prometheus 内置的查询分析页面,做 ad-hoc 查询,夜莺做了一些 UI 优化,同时提供了一些内置 promql 指标,让不太了解 promql 的用户也可以快速查询。
|
||||
|
||||

|
||||
@@ -95,8 +90,8 @@
|
||||
|
||||
|
||||
## 社区共建
|
||||
- ❇️ 请阅读浏览[夜莺开源项目和社区治理架构草案](./doc/community-governance.md),真诚欢迎每一位用户、开发者、公司以及组织,使用夜莺监控、积极反馈 Bug、提交功能需求、分享最佳实践,共建专业、活跃的夜莺开源社区。
|
||||
- ❤️ 夜莺贡献者
|
||||
- ❇️请阅读浏览[夜莺开源项目和社区治理架构草案](./doc/community-governance.md),真诚欢迎每一位用户、开发者、公司以及组织,使用夜莺监控、积极反馈 Bug、提交功能需求、分享最佳实践,共建专业、活跃的夜莺开源社区。
|
||||
- 夜莺贡献者❤️
|
||||
<a href="https://github.com/ccfos/nightingale/graphs/contributors">
|
||||
<img src="https://contrib.rocks/image?repo=ccfos/nightingale" />
|
||||
</a>
|
||||
|
||||
129
README_en.md
129
README_en.md
@@ -1,113 +1,104 @@
|
||||
<p align="center">
|
||||
<a href="https://github.com/ccfos/nightingale">
|
||||
<img src="doc/img/Nightingale_L_V.png" alt="nightingale - cloud native monitoring" width="100" /></a>
|
||||
</p>
|
||||
<p align="center">
|
||||
<b>Open-source Alert Management Expert, an Integrated Observability Platform</b>
|
||||
<img src="doc/img/Nightingale_L_V.png" alt="nightingale - cloud native monitoring" width="240" /></a>
|
||||
</p>
|
||||
|
||||
<p align="center">
|
||||
<a href="https://flashcat.cloud/docs/">
|
||||
<img alt="GitHub latest release" src="https://img.shields.io/github/v/release/ccfos/nightingale"/>
|
||||
<a href="https://n9e.github.io">
|
||||
<img alt="Docs" src="https://img.shields.io/badge/docs-get%20started-brightgreen"/></a>
|
||||
<a href="https://hub.docker.com/u/flashcatcloud">
|
||||
<img alt="Docker pulls" src="https://img.shields.io/docker/pulls/flashcatcloud/nightingale"/></a>
|
||||
<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/ccfos/nightingale">
|
||||
<img alt="GitHub Repo issues" src="https://img.shields.io/github/issues/ccfos/nightingale">
|
||||
<img alt="GitHub Repo issues closed" src="https://img.shields.io/github/issues-closed/ccfos/nightingale">
|
||||
<img alt="GitHub forks" src="https://img.shields.io/github/forks/ccfos/nightingale">
|
||||
<a href="https://github.com/ccfos/nightingale/graphs/contributors">
|
||||
<img alt="GitHub contributors" src="https://img.shields.io/github/contributors-anon/ccfos/nightingale"/></a>
|
||||
<img alt="GitHub Repo stars" src="https://img.shields.io/github/stars/ccfos/nightingale">
|
||||
<img alt="GitHub forks" src="https://img.shields.io/github/forks/ccfos/nightingale">
|
||||
<br/><img alt="GitHub Repo issues" src="https://img.shields.io/github/issues/ccfos/nightingale">
|
||||
<img alt="GitHub Repo issues closed" src="https://img.shields.io/github/issues-closed/ccfos/nightingale">
|
||||
<img alt="GitHub latest release" src="https://img.shields.io/github/v/release/ccfos/nightingale"/>
|
||||
<img alt="License" src="https://img.shields.io/badge/license-Apache--2.0-blue"/>
|
||||
<a href="https://n9e-talk.slack.com/">
|
||||
<img alt="GitHub contributors" src="https://img.shields.io/badge/join%20slack-%23n9e-brightgreen.svg"/></a>
|
||||
<img alt="License" src="https://img.shields.io/badge/license-Apache--2.0-blue"/>
|
||||
</p>
|
||||
<p align="center">
|
||||
An open-source cloud-native monitoring system that is <b>all-in-one</b> <br/>
|
||||
<b>Out-of-the-box</b>, it integrates data collection, visualization, and monitoring alert <br/>
|
||||
We recommend upgrading your <b>Prometheus + AlertManager + Grafana</b> combination to Nightingale!
|
||||
</p>
|
||||
|
||||
|
||||
|
||||
[English](./README_en.md) | [中文](./README.md)
|
||||
|
||||
## What is Nightingale
|
||||
|
||||
Nightingale aims to combine the advantages of Prometheus and Grafana. It manages alert rules and visualizes metrics, logs, traces in a beautiful WebUI.
|
||||
## Highlighted Features
|
||||
|
||||
Originally developed and open-sourced by Didi, Nightingale was donated to the China Computer Federation Open Source Development Committee (CCF ODC) on May 11, 2022, becoming the first open-source project accepted by the CCF ODC after its establishment.
|
||||
- **Out-of-the-box**
|
||||
- Supports multiple deployment methods such as **Docker, Helm Chart, and cloud services**, integrates data collection, monitoring, and alerting into one system, and comes with various monitoring dashboards, quick views, and alert rule templates. **It greatly reduces the construction cost, learning cost, and usage cost of cloud-native monitoring systems**.
|
||||
- **Professional Alerting**
|
||||
- Provides visual alert configuration and management, supports various alert rules, offers the ability to configure silence and subscription rules, supports multiple alert delivery channels, and has features such as alert self-healing and event management.
|
||||
- **Cloud-Native**
|
||||
- Quickly builds an enterprise-level cloud-native monitoring system through a turnkey approach, supports multiple collectors such as [Categraf](https://github.com/flashcatcloud/categraf), Telegraf, and Grafana-agent, supports multiple data sources such as Prometheus, VictoriaMetrics, M3DB, ElasticSearch, and Jaeger, and is compatible with importing Grafana dashboards. **It seamlessly integrates with the cloud-native ecosystem**.
|
||||
- **High Performance and High Availability**
|
||||
- Due to the multi-data-source management engine of Nightingale and its excellent architecture design, and utilizing a high-performance time-series database, it can handle data collection, storage, and alert analysis scenarios with billions of time-series data, saving a lot of costs.
|
||||
- Nightingale components can be horizontally scaled with no single point of failure. It has been deployed in thousands of enterprises and tested in harsh production practices. Many leading Internet companies have used Nightingale for cluster machines with hundreds of nodes, processing billions of time-series data.
|
||||
- **Flexible Extension and Centralized Management**
|
||||
- Nightingale can be deployed on a 1-core 1G cloud host, deployed in a cluster of hundreds of machines, or run in Kubernetes. Time-series databases, alert engines, and other components can also be decentralized to various data centers and regions, balancing edge deployment with centralized management. **It solves the problem of data fragmentation and lack of unified views**.
|
||||
|
||||
|
||||
## Quick Start
|
||||
#### If you are using Prometheus and have one or more of the following requirement scenarios, it is recommended that you upgrade to Nightingale:
|
||||
|
||||
- 👉 [Documentation](https://flashcat.cloud/docs/) | [Download](https://flashcat.cloud/download/nightingale/)
|
||||
- ❤️ [Report a Bug](https://github.com/ccfos/nightingale/issues/new?assignees=&labels=&projects=&template=question.yml)
|
||||
- ℹ️ For faster access, the above documentation and download sites are hosted on [FlashcatCloud](https://flashcat.cloud).
|
||||
- Multiple systems such as Prometheus, Alertmanager, Grafana, etc. are fragmented and lack a unified view and cannot be used out of the box;
|
||||
- The way to manage Prometheus and Alertmanager by modifying configuration files has a big learning curve and is difficult to collaborate;
|
||||
- Too much data to scale-up your Prometheus cluster;
|
||||
- Multiple Prometheus clusters running in production environments, which faced high management and usage costs;
|
||||
|
||||
## Features
|
||||
#### If you are using Zabbix and have the following scenarios, it is recommended that you upgrade to Nightingale:
|
||||
|
||||
- **Integration with Multiple Time-Series Databases:** Supports integration with various time-series databases such as Prometheus, VictoriaMetrics, Thanos, Mimir, M3DB, and TDengine, enabling unified alert management.
|
||||
- **Advanced Alerting Capabilities:** Comes with built-in support for multiple alerting rules, extensible to common notification channels. It also supports alert suppression, silencing, subscription, self-healing, and alert event management.
|
||||
- **High-Performance Visualization Engine:** Offers various chart styles with numerous built-in dashboard templates and the ability to import Grafana templates. Ready to use with a business-friendly open-source license.
|
||||
- **Support for Common Collectors:** Compatible with [Categraf](https://flashcat.cloud/product/categraf), Telegraf, Grafana-agent, Datadog-agent, and various exporters as collectors—there's no data that can't be monitored.
|
||||
- **Seamless Integration with [Flashduty](https://flashcat.cloud/product/flashcat-duty/):** Enables alert aggregation, acknowledgment, escalation, scheduling, and IM integration, ensuring no alerts are missed, reducing unnecessary interruptions, and enhancing efficient collaboration.
|
||||
- Monitoring too much data and wanting a better scalable solution;
|
||||
- A high learning curve and a desire for better efficiency of collaborative use in a multi-person, multi-team model;
|
||||
- Microservice and cloud-native architectures with variable monitoring data lifecycles and high monitoring data dimension bases, which are not easily adaptable to the Zabbix data model;
|
||||
|
||||
|
||||
#### If you are using [open-falcon](https://github.com/open-falcon/falcon-plus), we recommend you to upgrade to Nightingale:
|
||||
- For more information about open-falcon and Nightingale, please refer to read [Ten features and trends of cloud-native monitoring](https://mp.weixin.qq.com/s?__biz=MzkzNjI5OTM5Nw==&mid=2247483738&idx=1&sn=e8bdbb974a2cd003c1abcc2b5405dd18&chksm=c2a19fb0f5d616a63185cd79277a79a6b80118ef2185890d0683d2bb20451bd9303c78d083c5#rd)。
|
||||
|
||||
## Getting Started
|
||||
|
||||
[https://n9e.github.io/](https://n9e.github.io/)
|
||||
|
||||
## Screenshots
|
||||
|
||||
You can switch languages and themes in the top right corner. We now support English, Simplified Chinese, and Traditional Chinese.
|
||||
|
||||

|
||||
|
||||
### Instant Query
|
||||
|
||||
Similar to the built-in query analysis page in Prometheus, Nightingale offers an ad-hoc query feature with UI enhancements. It also provides built-in PromQL metrics, allowing users unfamiliar with PromQL to quickly perform queries.
|
||||
|
||||

|
||||
|
||||
### Metric View
|
||||
|
||||
Alternatively, you can use the Metric View to access data. With this feature, Instant Query becomes less necessary, as it caters more to advanced users. Regular users can easily perform queries using the Metric View.
|
||||
|
||||

|
||||
|
||||
### Built-in Dashboards
|
||||
|
||||
Nightingale includes commonly used dashboards that can be imported and used directly. You can also import Grafana dashboards, although compatibility is limited to basic Grafana charts. If you’re accustomed to Grafana, it’s recommended to continue using it for visualization, with Nightingale serving as an alerting engine.
|
||||
|
||||

|
||||
|
||||
### Built-in Alert Rules
|
||||
|
||||
In addition to the built-in dashboards, Nightingale also comes with numerous alert rules that are ready to use out of the box.
|
||||
|
||||

|
||||
|
||||
|
||||
https://user-images.githubusercontent.com/792850/216888712-2565fcea-9df5-47bd-a49e-d60af9bd76e8.mp4
|
||||
|
||||
## Architecture
|
||||
|
||||
In most community scenarios, Nightingale is primarily used as an alert engine, integrating with multiple time-series databases to unify alert rule management. Grafana remains the preferred tool for visualization. As an alert engine, the product architecture of Nightingale is as follows:
|
||||
<img src="doc/img/arch-product.png" width="600">
|
||||
|
||||

|
||||
Nightingale monitoring can receive monitoring data reported by various collectors (such as [Categraf](https://github.com/flashcatcloud/categraf) , telegraf, grafana-agent, Prometheus, etc.) and write them to various popular time-series databases (such as Prometheus, M3DB, VictoriaMetrics, Thanos, TDEngine, etc.). It provides configuration capabilities for alert rules, silence rules, and subscription rules, as well as the ability to view monitoring data. It also provides automatic alarm self-healing mechanisms (such as automatically calling back to a webhook address or executing a script after an alarm is triggered), and the ability to store and manage historical alarm events and view them in groups.
|
||||
|
||||
For certain edge data centers with poor network connectivity to the central Nightingale server, we offer a distributed deployment mode for the alert engine. In this mode, even if the network is disconnected, the alerting functionality remains unaffected.
|
||||
If the performance of a standalone time-series database (such as Prometheus) has bottlenecks or poor disaster recovery, we recommend using [VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics). The VictoriaMetrics architecture is relatively simple, has excellent performance, and is easy to deploy and maintain. The architecture diagram is as shown above. For more detailed documentation on VictoriaMetrics, please refer to its [official website](https://victoriametrics.com/).
|
||||
|
||||

|
||||
**We welcome you to participate in the Nightingale open-source project and community in various ways, including but not limited to**:
|
||||
- Adding and improving documentation => [n9e.github.io](https://n9e.github.io/)
|
||||
- Sharing your best practices and experience in using Nightingale monitoring => [Article sharing]((https://n9e.github.io/docs/prologue/share/))
|
||||
- Submitting product suggestions => [github issue](https://github.com/ccfos/nightingale/issues/new?assignees=&labels=kind%2Ffeature&template=enhancement.md)
|
||||
- Submitting code to make Nightingale monitoring faster, more stable, and easier to use => [github pull request](https://github.com/didi/nightingale/pulls)
|
||||
|
||||
|
||||
## Communication Channels
|
||||
**Respecting, recognizing, and recording the work of every contributor** is the first guiding principle of the Nightingale open-source community. We advocate effective questioning, which not only respects the developer's time but also contributes to the accumulation of knowledge in the entire community
|
||||
- Before asking a question, please first refer to the [FAQ](https://www.gitlink.org.cn/ccfos/nightingale/wiki/faq)
|
||||
- We use [GitHub Discussions](https://github.com/ccfos/nightingale/discussions) as the communication forum. You can search and ask questions here.
|
||||
- We also recommend that you join ours [Slack channel](https://n9e-talk.slack.com/) to exchange experiences with other Nightingale users.
|
||||
|
||||
- **Report Bugs:** It is highly recommended to submit issues via the [Nightingale GitHub Issue tracker](https://github.com/ccfos/nightingale/issues/new?assignees=&labels=kind%2Fbug&projects=&template=bug_report.yml).
|
||||
- **Documentation:** For more information, we recommend thoroughly browsing the [Nightingale Documentation Site](https://flashcat.cloud/docs/content/flashcat-monitor/nightingale-v7/introduction/).
|
||||
|
||||
## Who is using Nightingale
|
||||
You can register your usage and share your experience by posting on **[Who is Using Nightingale](https://github.com/ccfos/nightingale/issues/897)**.
|
||||
|
||||
## Stargazers over time
|
||||
[](https://starchart.cc/ccfos/nightingale)
|
||||
|
||||
[](https://star-history.com/#ccfos/nightingale&Date)
|
||||
|
||||
## Community Co-Building
|
||||
|
||||
- ❇️ Please read the [Nightingale Open Source Project and Community Governance Draft](./doc/community-governance.md). We sincerely welcome every user, developer, company, and organization to use Nightingale, actively report bugs, submit feature requests, share best practices, and help build a professional and active open-source community.
|
||||
- ❤️ Nightingale Contributors
|
||||
## Contributors
|
||||
<a href="https://github.com/ccfos/nightingale/graphs/contributors">
|
||||
<img src="https://contrib.rocks/image?repo=ccfos/nightingale" />
|
||||
</a>
|
||||
|
||||
## License
|
||||
- [Apache License V2.0](https://github.com/didi/nightingale/blob/main/LICENSE)
|
||||
[Apache License V2.0](https://github.com/didi/nightingale/blob/main/LICENSE)
|
||||
@@ -16,7 +16,6 @@ import (
|
||||
"github.com/ccfos/nightingale/v6/alert/sender"
|
||||
"github.com/ccfos/nightingale/v6/conf"
|
||||
"github.com/ccfos/nightingale/v6/dumper"
|
||||
"github.com/ccfos/nightingale/v6/ibex"
|
||||
"github.com/ccfos/nightingale/v6/memsto"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
@@ -27,6 +26,8 @@ import (
|
||||
"github.com/ccfos/nightingale/v6/pushgw/writer"
|
||||
"github.com/ccfos/nightingale/v6/storage"
|
||||
"github.com/ccfos/nightingale/v6/tdengine"
|
||||
|
||||
"github.com/flashcatcloud/ibex/src/cmd/ibex"
|
||||
)
|
||||
|
||||
func Initialize(configDir string, cryptoKey string) (func(), error) {
|
||||
@@ -40,14 +41,14 @@ func Initialize(configDir string, cryptoKey string) (func(), error) {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
ctx := ctx.NewContext(context.Background(), nil, false, config.CenterApi)
|
||||
|
||||
var redis storage.Redis
|
||||
redis, err = storage.NewRedis(config.Redis)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
ctx := ctx.NewContext(context.Background(), nil, redis, false, config.CenterApi)
|
||||
|
||||
syncStats := memsto.NewSyncStats()
|
||||
alertStats := astats.NewSyncStats()
|
||||
|
||||
@@ -73,13 +74,13 @@ func Initialize(configDir string, cryptoKey string) (func(), error) {
|
||||
rt := router.New(config.HTTP, config.Alert, alertMuteCache, targetCache, busiGroupCache, alertStats, ctx, externalProcessors)
|
||||
|
||||
if config.Ibex.Enable {
|
||||
ibex.ServerStart(ctx, false, nil, redis, config.HTTP.APIForService.BasicAuth, config.Alert.Heartbeat, &config.CenterApi, r, nil, config.Ibex, config.HTTP.Port)
|
||||
ibex.ServerStart(false, nil, redis, config.HTTP.APIForService.BasicAuth, config.Alert.Heartbeat, &config.CenterApi, r, nil, config.Ibex, config.HTTP.Port)
|
||||
}
|
||||
|
||||
rt.Config(r)
|
||||
dumper.ConfigRouter(r)
|
||||
|
||||
httpClean := httpx.Init(config.HTTP, context.Background(), r)
|
||||
httpClean := httpx.Init(config.HTTP, r)
|
||||
|
||||
return func() {
|
||||
logxClean()
|
||||
@@ -111,5 +112,5 @@ func Start(alertc aconf.Alert, pushgwc pconf.Pushgw, syncStats *memsto.Stats, al
|
||||
go consumer.LoopConsume()
|
||||
|
||||
go queue.ReportQueueSize(alertStats)
|
||||
go sender.InitEmailSender(ctx, notifyConfigCache)
|
||||
go sender.InitEmailSender(notifyConfigCache)
|
||||
}
|
||||
|
||||
@@ -2,7 +2,6 @@ package common
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"strings"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
)
|
||||
@@ -35,9 +34,9 @@ func MatchGroupsName(groupName string, groupFilter []models.TagFilter) bool {
|
||||
func matchTag(value string, filter models.TagFilter) bool {
|
||||
switch filter.Func {
|
||||
case "==":
|
||||
return strings.TrimSpace(filter.Value) == strings.TrimSpace(value)
|
||||
return filter.Value == value
|
||||
case "!=":
|
||||
return strings.TrimSpace(filter.Value) != strings.TrimSpace(value)
|
||||
return filter.Value != value
|
||||
case "in":
|
||||
_, has := filter.Vset[value]
|
||||
return has
|
||||
|
||||
@@ -167,7 +167,7 @@ func (e *Dispatch) HandleEventNotify(event *models.AlertCurEvent, isSubscribe bo
|
||||
}
|
||||
|
||||
// 处理事件发送,这里用一个goroutine处理一个event的所有发送事件
|
||||
go e.Send(rule, event, notifyTarget, isSubscribe)
|
||||
go e.Send(rule, event, notifyTarget)
|
||||
|
||||
// 如果是不是订阅规则出现的event, 则需要处理订阅规则的event
|
||||
if !isSubscribe {
|
||||
@@ -238,12 +238,11 @@ func (e *Dispatch) handleSub(sub *models.AlertSubscribe, event models.AlertCurEv
|
||||
e.HandleEventNotify(&event, true)
|
||||
}
|
||||
|
||||
func (e *Dispatch) Send(rule *models.AlertRule, event *models.AlertCurEvent, notifyTarget *NotifyTarget, isSubscribe bool) {
|
||||
func (e *Dispatch) Send(rule *models.AlertRule, event *models.AlertCurEvent, notifyTarget *NotifyTarget) {
|
||||
needSend := e.BeforeSenderHook(event)
|
||||
if needSend {
|
||||
for channel, uids := range notifyTarget.ToChannelUserMap() {
|
||||
msgCtx := sender.BuildMessageContext(e.ctx, rule, []*models.AlertCurEvent{event},
|
||||
uids, e.userCache, e.Astats)
|
||||
msgCtx := sender.BuildMessageContext(rule, []*models.AlertCurEvent{event}, uids, e.userCache, e.Astats)
|
||||
e.RwLock.RLock()
|
||||
s := e.Senders[channel]
|
||||
e.RwLock.RUnlock()
|
||||
@@ -267,19 +266,13 @@ func (e *Dispatch) Send(rule *models.AlertRule, event *models.AlertCurEvent, not
|
||||
|
||||
// handle global webhooks
|
||||
if e.alerting.WebhookBatchSend {
|
||||
sender.BatchSendWebhooks(e.ctx, notifyTarget.ToWebhookList(), event, e.Astats)
|
||||
sender.BatchSendWebhooks(notifyTarget.ToWebhookList(), event, e.Astats)
|
||||
} else {
|
||||
sender.SingleSendWebhooks(e.ctx, notifyTarget.ToWebhookList(), event, e.Astats)
|
||||
sender.SingleSendWebhooks(notifyTarget.ToWebhookList(), event, e.Astats)
|
||||
}
|
||||
|
||||
// handle plugin call
|
||||
go sender.MayPluginNotify(e.ctx, e.genNoticeBytes(event), e.notifyConfigCache.
|
||||
GetNotifyScript(), e.Astats, event)
|
||||
|
||||
if !isSubscribe {
|
||||
// handle ibex callbacks
|
||||
e.HandleIbex(rule, event)
|
||||
}
|
||||
go sender.MayPluginNotify(e.genNoticeBytes(event), e.notifyConfigCache.GetNotifyScript(), e.Astats)
|
||||
}
|
||||
|
||||
func (e *Dispatch) SendCallbacks(rule *models.AlertRule, notifyTarget *NotifyTarget, event *models.AlertCurEvent) {
|
||||
@@ -329,30 +322,6 @@ func (e *Dispatch) SendCallbacks(rule *models.AlertRule, notifyTarget *NotifyTar
|
||||
}
|
||||
}
|
||||
|
||||
func (e *Dispatch) HandleIbex(rule *models.AlertRule, event *models.AlertCurEvent) {
|
||||
// 解析 RuleConfig 字段
|
||||
var ruleConfig struct {
|
||||
TaskTpls []*models.Tpl `json:"task_tpls"`
|
||||
}
|
||||
json.Unmarshal([]byte(rule.RuleConfig), &ruleConfig)
|
||||
|
||||
for _, t := range ruleConfig.TaskTpls {
|
||||
if t.TplId == 0 {
|
||||
continue
|
||||
}
|
||||
|
||||
if len(t.Host) == 0 {
|
||||
sender.CallIbex(e.ctx, t.TplId, event.TargetIdent,
|
||||
e.taskTplsCache, e.targetCache, e.userCache, event)
|
||||
continue
|
||||
}
|
||||
for _, host := range t.Host {
|
||||
sender.CallIbex(e.ctx, t.TplId, host,
|
||||
e.taskTplsCache, e.targetCache, e.userCache, event)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
type Notice struct {
|
||||
Event *models.AlertCurEvent `json:"event"`
|
||||
Tpls map[string]string `json:"tpls"`
|
||||
|
||||
@@ -46,14 +46,6 @@ const (
|
||||
QUERY_DATA = "query_data"
|
||||
)
|
||||
|
||||
type JoinType string
|
||||
|
||||
const (
|
||||
Left JoinType = "left"
|
||||
Right JoinType = "right"
|
||||
Inner JoinType = "inner"
|
||||
)
|
||||
|
||||
func NewAlertRuleWorker(rule *models.AlertRule, datasourceId int64, processor *process.Processor, promClients *prom.PromClientMap, tdengineClients *tdengine.TdengineClientMap, ctx *ctx.Context) *AlertRuleWorker {
|
||||
arw := &AlertRuleWorker{
|
||||
datasourceId: datasourceId,
|
||||
@@ -266,12 +258,9 @@ func (arw *AlertRuleWorker) GetTdengineAnomalyPoint(rule *models.AlertRule, dsId
|
||||
arw.inhibit = ruleQuery.Inhibit
|
||||
if len(ruleQuery.Queries) > 0 {
|
||||
seriesStore := make(map[uint64]models.DataResp)
|
||||
// 将不同查询的 hash 索引分组存放
|
||||
seriesTagIndexes := make([]map[uint64][]uint64, 0)
|
||||
seriesTagIndex := make(map[uint64][]uint64)
|
||||
|
||||
for _, query := range ruleQuery.Queries {
|
||||
seriesTagIndex := make(map[uint64][]uint64)
|
||||
|
||||
arw.processor.Stats.CounterQueryDataTotal.WithLabelValues(fmt.Sprintf("%d", arw.datasourceId)).Inc()
|
||||
cli := arw.tdengineClients.GetCli(dsId)
|
||||
if cli == nil {
|
||||
@@ -289,13 +278,13 @@ func (arw *AlertRuleWorker) GetTdengineAnomalyPoint(rule *models.AlertRule, dsId
|
||||
arw.processor.Stats.CounterRuleEvalErrorTotal.WithLabelValues(fmt.Sprintf("%v", arw.processor.DatasourceId()), QUERY_DATA).Inc()
|
||||
continue
|
||||
}
|
||||
|
||||
// 此条日志很重要,是告警判断的现场值
|
||||
logger.Debugf("rule_eval rid:%d req:%+v resp:%+v", rule.Id, query, series)
|
||||
MakeSeriesMap(series, seriesTagIndex, seriesStore)
|
||||
seriesTagIndexes = append(seriesTagIndexes, seriesTagIndex)
|
||||
}
|
||||
|
||||
points, recoverPoints = GetAnomalyPoint(rule.Id, ruleQuery, seriesTagIndexes, seriesStore)
|
||||
points, recoverPoints = GetAnomalyPoint(rule.Id, ruleQuery, seriesTagIndex, seriesStore)
|
||||
}
|
||||
|
||||
return points, recoverPoints
|
||||
@@ -455,76 +444,17 @@ func (arw *AlertRuleWorker) GetHostAnomalyPoint(ruleConfig string) []common.Anom
|
||||
return lst
|
||||
}
|
||||
|
||||
func GetAnomalyPoint(ruleId int64, ruleQuery models.RuleQuery, seriesTagIndexes []map[uint64][]uint64, seriesStore map[uint64]models.DataResp) ([]common.AnomalyPoint, []common.AnomalyPoint) {
|
||||
func GetAnomalyPoint(ruleId int64, ruleQuery models.RuleQuery, seriesTagIndex map[uint64][]uint64, seriesStore map[uint64]models.DataResp) ([]common.AnomalyPoint, []common.AnomalyPoint) {
|
||||
points := []common.AnomalyPoint{}
|
||||
recoverPoints := []common.AnomalyPoint{}
|
||||
|
||||
if len(ruleQuery.Triggers) == 0 {
|
||||
return points, recoverPoints
|
||||
}
|
||||
|
||||
for _, trigger := range ruleQuery.Triggers {
|
||||
// seriesTagIndex 的 key 仅做分组使用,value 为每组 series 的 hash
|
||||
seriesTagIndex := make(map[uint64][]uint64)
|
||||
|
||||
if len(trigger.Joins) == 0 {
|
||||
// 没有 join 条件,走原逻辑
|
||||
last := seriesTagIndexes[0]
|
||||
for i := 1; i < len(seriesTagIndexes); i++ {
|
||||
last = originalJoin(last, seriesTagIndexes[i])
|
||||
}
|
||||
seriesTagIndex = last
|
||||
} else {
|
||||
// 有 join 条件,按条件依次合并
|
||||
if len(seriesTagIndexes) != len(trigger.Joins)+1 {
|
||||
logger.Errorf("rule_eval rid:%d queries' count: %d not match join condition's count: %d", ruleId, len(seriesTagIndexes), len(trigger.Joins))
|
||||
continue
|
||||
}
|
||||
|
||||
last := seriesTagIndexes[0]
|
||||
lastRehashed := rehashSet(last, seriesStore, trigger.Joins[0].On)
|
||||
for i := range trigger.Joins {
|
||||
cur := seriesTagIndexes[i+1]
|
||||
switch trigger.Joins[i].JoinType {
|
||||
case "original":
|
||||
last = originalJoin(last, cur)
|
||||
case "none":
|
||||
last = noneJoin(last, cur)
|
||||
case "cartesian":
|
||||
last = cartesianJoin(last, cur)
|
||||
case "inner_join":
|
||||
curRehashed := rehashSet(cur, seriesStore, trigger.Joins[i].On)
|
||||
lastRehashed = onJoin(lastRehashed, curRehashed, Inner)
|
||||
last = flatten(lastRehashed)
|
||||
case "left_join":
|
||||
curRehashed := rehashSet(cur, seriesStore, trigger.Joins[i].On)
|
||||
lastRehashed = onJoin(lastRehashed, curRehashed, Left)
|
||||
last = flatten(lastRehashed)
|
||||
case "right_join":
|
||||
curRehashed := rehashSet(cur, seriesStore, trigger.Joins[i].On)
|
||||
lastRehashed = onJoin(curRehashed, lastRehashed, Right)
|
||||
last = flatten(lastRehashed)
|
||||
case "left_exclude":
|
||||
curRehashed := rehashSet(cur, seriesStore, trigger.Joins[i].On)
|
||||
lastRehashed = exclude(lastRehashed, curRehashed)
|
||||
last = flatten(lastRehashed)
|
||||
case "right_exclude":
|
||||
curRehashed := rehashSet(cur, seriesStore, trigger.Joins[i].On)
|
||||
lastRehashed = exclude(curRehashed, lastRehashed)
|
||||
last = flatten(lastRehashed)
|
||||
default:
|
||||
logger.Warningf("rule_eval rid:%d join type:%s not support", ruleId, trigger.Joins[i].JoinType)
|
||||
}
|
||||
}
|
||||
seriesTagIndex = last
|
||||
}
|
||||
|
||||
for _, seriesHash := range seriesTagIndex {
|
||||
sort.Slice(seriesHash, func(i, j int) bool {
|
||||
return seriesHash[i] < seriesHash[j]
|
||||
})
|
||||
|
||||
m := make(map[string]interface{})
|
||||
m := make(map[string]float64)
|
||||
var ts int64
|
||||
var sample models.DataResp
|
||||
var value float64
|
||||
@@ -589,143 +519,6 @@ func GetAnomalyPoint(ruleId int64, ruleQuery models.RuleQuery, seriesTagIndexes
|
||||
return points, recoverPoints
|
||||
}
|
||||
|
||||
func flatten(rehashed map[uint64][][]uint64) map[uint64][]uint64 {
|
||||
seriesTagIndex := make(map[uint64][]uint64)
|
||||
var i uint64
|
||||
for _, HashTagIndex := range rehashed {
|
||||
for u := range HashTagIndex {
|
||||
seriesTagIndex[i] = HashTagIndex[u]
|
||||
i++
|
||||
}
|
||||
}
|
||||
return seriesTagIndex
|
||||
}
|
||||
|
||||
// onJoin 组合两个经过 rehash 之后的集合
|
||||
// 如查询 A,经过 on data_base rehash 分组后
|
||||
// [[A1{data_base=1, table=alert},A2{data_base=1, table=alert}],[A5{data_base=1, table=board}]]
|
||||
// [[A3{data_base=2, table=board}],[A4{data_base=2, table=alert}]]
|
||||
// 查询 B,经过 on data_base rehash 分组后
|
||||
// [[B1{data_base=1, table=alert}]]
|
||||
// [[B2{data_base=2, table=alert}]]
|
||||
// 内联得到
|
||||
// [[A1{data_base=1, table=alert},A2{data_base=1, table=alert},B1{data_base=1, table=alert}],[A5{data_base=1, table=board},[B1{data_base=1, table=alert}]]
|
||||
// [[A3{data_base=2, table=board},B2{data_base=2, table=alert}],[A4{data_base=2, table=alert},B2{data_base=2, table=alert}]]
|
||||
func onJoin(reHashTagIndex1 map[uint64][][]uint64, reHashTagIndex2 map[uint64][][]uint64, joinType JoinType) map[uint64][][]uint64 {
|
||||
reHashTagIndex := make(map[uint64][][]uint64)
|
||||
for rehash, _ := range reHashTagIndex1 {
|
||||
if _, ok := reHashTagIndex2[rehash]; ok {
|
||||
// 若有 rehash 相同的记录,两两合并
|
||||
for i1 := range reHashTagIndex1[rehash] {
|
||||
for i2 := range reHashTagIndex2[rehash] {
|
||||
reHashTagIndex[rehash] = append(reHashTagIndex[rehash], mergeNewArray(reHashTagIndex1[rehash][i1], reHashTagIndex2[rehash][i2]))
|
||||
}
|
||||
}
|
||||
} else {
|
||||
// 合并方式不为 inner 时,需要保留 reHashTagIndex1 中未匹配的记录
|
||||
if joinType != Inner {
|
||||
reHashTagIndex[rehash] = reHashTagIndex1[rehash]
|
||||
}
|
||||
}
|
||||
}
|
||||
return reHashTagIndex
|
||||
}
|
||||
|
||||
// rehashSet 重新 hash 分组
|
||||
// 如当前查询 A 有五条记录
|
||||
// A1{data_base=1, table=alert}
|
||||
// A2{data_base=1, table=alert}
|
||||
// A3{data_base=2, table=board}
|
||||
// A4{data_base=2, table=alert}
|
||||
// A5{data_base=1, table=board}
|
||||
// 经过预处理(按曲线分组,此步已在进入 GetAnomalyPoint 函数前完成)后,分为 4 组,
|
||||
// [A1{data_base=1, table=alert},A2{data_base=1, table=alert}]
|
||||
// [A3{data_base=2, table=board}]
|
||||
// [A4{data_base=2, table=alert}]
|
||||
// [A5{data_base=1, table=board}]
|
||||
// 若 rehashSet 按 data_base 重新分组,此时会得到按 rehash 值分的二维数组,即不会将 rehash 值相同的记录完全合并
|
||||
// [[A1{data_base=1, table=alert},A2{data_base=1, table=alert}],[A5{data_base=1, table=board}]]
|
||||
// [[A3{data_base=2, table=board}],[A4{data_base=2, table=alert}]]
|
||||
func rehashSet(seriesTagIndex1 map[uint64][]uint64, seriesStore map[uint64]models.DataResp, on []string) map[uint64][][]uint64 {
|
||||
reHashTagIndex := make(map[uint64][][]uint64)
|
||||
for _, seriesHashes := range seriesTagIndex1 {
|
||||
if len(seriesHashes) == 0 {
|
||||
continue
|
||||
}
|
||||
series, exists := seriesStore[seriesHashes[0]]
|
||||
if !exists {
|
||||
continue
|
||||
}
|
||||
rehash := hash.GetTargetTagHash(series.Metric, on)
|
||||
if _, ok := reHashTagIndex[rehash]; !ok {
|
||||
reHashTagIndex[rehash] = make([][]uint64, 0)
|
||||
}
|
||||
reHashTagIndex[rehash] = append(reHashTagIndex[rehash], seriesHashes)
|
||||
}
|
||||
return reHashTagIndex
|
||||
}
|
||||
|
||||
// 笛卡尔积,查询的结果两两合并
|
||||
func cartesianJoin(seriesTagIndex1 map[uint64][]uint64, seriesTagIndex2 map[uint64][]uint64) map[uint64][]uint64 {
|
||||
var index uint64
|
||||
seriesTagIndex := make(map[uint64][]uint64)
|
||||
for _, seriesHashes1 := range seriesTagIndex1 {
|
||||
for _, seriesHashes2 := range seriesTagIndex2 {
|
||||
seriesTagIndex[index] = mergeNewArray(seriesHashes1, seriesHashes2)
|
||||
index++
|
||||
}
|
||||
}
|
||||
return seriesTagIndex
|
||||
}
|
||||
|
||||
// noneJoin 直接拼接
|
||||
func noneJoin(seriesTagIndex1 map[uint64][]uint64, seriesTagIndex2 map[uint64][]uint64) map[uint64][]uint64 {
|
||||
seriesTagIndex := make(map[uint64][]uint64)
|
||||
var index uint64
|
||||
for _, seriesHashes := range seriesTagIndex1 {
|
||||
seriesTagIndex[index] = seriesHashes
|
||||
index++
|
||||
}
|
||||
for _, seriesHashes := range seriesTagIndex2 {
|
||||
seriesTagIndex[index] = seriesHashes
|
||||
index++
|
||||
}
|
||||
return seriesTagIndex
|
||||
}
|
||||
|
||||
// originalJoin 原始分组方案,key 相同,即标签全部相同分为一组
|
||||
func originalJoin(seriesTagIndex1 map[uint64][]uint64, seriesTagIndex2 map[uint64][]uint64) map[uint64][]uint64 {
|
||||
seriesTagIndex := make(map[uint64][]uint64)
|
||||
for tagHash, seriesHashes := range seriesTagIndex1 {
|
||||
if _, ok := seriesTagIndex[tagHash]; !ok {
|
||||
seriesTagIndex[tagHash] = mergeNewArray(seriesHashes)
|
||||
} else {
|
||||
seriesTagIndex[tagHash] = append(seriesTagIndex[tagHash], seriesHashes...)
|
||||
}
|
||||
}
|
||||
|
||||
for tagHash, seriesHashes := range seriesTagIndex2 {
|
||||
if _, ok := seriesTagIndex[tagHash]; !ok {
|
||||
seriesTagIndex[tagHash] = mergeNewArray(seriesHashes)
|
||||
} else {
|
||||
seriesTagIndex[tagHash] = append(seriesTagIndex[tagHash], seriesHashes...)
|
||||
}
|
||||
}
|
||||
|
||||
return seriesTagIndex
|
||||
}
|
||||
|
||||
// exclude 左斥,留下在 reHashTagIndex1 中,但不在 reHashTagIndex2 中的记录
|
||||
func exclude(reHashTagIndex1 map[uint64][][]uint64, reHashTagIndex2 map[uint64][][]uint64) map[uint64][][]uint64 {
|
||||
reHashTagIndex := make(map[uint64][][]uint64)
|
||||
for rehash, _ := range reHashTagIndex1 {
|
||||
if _, ok := reHashTagIndex2[rehash]; !ok {
|
||||
reHashTagIndex[rehash] = reHashTagIndex1[rehash]
|
||||
}
|
||||
}
|
||||
return reHashTagIndex
|
||||
}
|
||||
|
||||
func MakeSeriesMap(series []models.DataResp, seriesTagIndex map[uint64][]uint64, seriesStore map[uint64]models.DataResp) {
|
||||
for i := 0; i < len(series); i++ {
|
||||
serieHash := hash.GetHash(series[i].Metric, series[i].Ref)
|
||||
@@ -739,11 +532,3 @@ func MakeSeriesMap(series []models.DataResp, seriesTagIndex map[uint64][]uint64,
|
||||
seriesTagIndex[tagHash] = append(seriesTagIndex[tagHash], serieHash)
|
||||
}
|
||||
}
|
||||
|
||||
func mergeNewArray(arg ...[]uint64) []uint64 {
|
||||
res := make([]uint64, 0)
|
||||
for _, a := range arg {
|
||||
res = append(res, a...)
|
||||
}
|
||||
return res
|
||||
}
|
||||
|
||||
@@ -1,271 +0,0 @@
|
||||
package eval
|
||||
|
||||
import (
|
||||
"reflect"
|
||||
"testing"
|
||||
|
||||
"golang.org/x/exp/slices"
|
||||
)
|
||||
|
||||
var (
|
||||
reHashTagIndex1 = map[uint64][][]uint64{
|
||||
1: {
|
||||
{1, 2}, {3, 4},
|
||||
},
|
||||
2: {
|
||||
{5, 6}, {7, 8},
|
||||
},
|
||||
}
|
||||
reHashTagIndex2 = map[uint64][][]uint64{
|
||||
1: {
|
||||
{9, 10}, {11, 12},
|
||||
},
|
||||
3: {
|
||||
{13, 14}, {15, 16},
|
||||
},
|
||||
}
|
||||
seriesTagIndex1 = map[uint64][]uint64{
|
||||
1: {1, 2, 3, 4},
|
||||
2: {5, 6, 7, 8},
|
||||
}
|
||||
seriesTagIndex2 = map[uint64][]uint64{
|
||||
1: {9, 10, 11, 12},
|
||||
3: {13, 14, 15, 16},
|
||||
}
|
||||
)
|
||||
|
||||
func Test_originalJoin(t *testing.T) {
|
||||
type args struct {
|
||||
seriesTagIndex1 map[uint64][]uint64
|
||||
seriesTagIndex2 map[uint64][]uint64
|
||||
}
|
||||
tests := []struct {
|
||||
name string
|
||||
args args
|
||||
want map[uint64][]uint64
|
||||
}{
|
||||
{
|
||||
name: "original join",
|
||||
args: args{
|
||||
seriesTagIndex1: map[uint64][]uint64{
|
||||
1: {1, 2, 3, 4},
|
||||
2: {5, 6, 7, 8},
|
||||
},
|
||||
seriesTagIndex2: map[uint64][]uint64{
|
||||
1: {9, 10, 11, 12},
|
||||
3: {13, 14, 15, 16},
|
||||
},
|
||||
},
|
||||
want: map[uint64][]uint64{
|
||||
1: {1, 2, 3, 4, 9, 10, 11, 12},
|
||||
2: {5, 6, 7, 8},
|
||||
3: {13, 14, 15, 16},
|
||||
},
|
||||
},
|
||||
}
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
if got := originalJoin(tt.args.seriesTagIndex1, tt.args.seriesTagIndex2); !reflect.DeepEqual(got, tt.want) {
|
||||
t.Errorf("originalJoin() = %v, want %v", got, tt.want)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func Test_exclude(t *testing.T) {
|
||||
type args struct {
|
||||
reHashTagIndex1 map[uint64][][]uint64
|
||||
reHashTagIndex2 map[uint64][][]uint64
|
||||
}
|
||||
tests := []struct {
|
||||
name string
|
||||
args args
|
||||
want map[uint64][]uint64
|
||||
}{
|
||||
{
|
||||
name: "left exclude",
|
||||
args: args{
|
||||
reHashTagIndex1: reHashTagIndex1,
|
||||
reHashTagIndex2: reHashTagIndex2,
|
||||
},
|
||||
want: map[uint64][]uint64{
|
||||
0: {5, 6},
|
||||
1: {7, 8},
|
||||
},
|
||||
},
|
||||
{
|
||||
name: "right exclude",
|
||||
args: args{
|
||||
reHashTagIndex1: reHashTagIndex2,
|
||||
reHashTagIndex2: reHashTagIndex1,
|
||||
},
|
||||
want: map[uint64][]uint64{
|
||||
3: {13, 14},
|
||||
4: {15, 16},
|
||||
},
|
||||
},
|
||||
}
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
if got := exclude(tt.args.reHashTagIndex1, tt.args.reHashTagIndex2); !allValueDeepEqual(flatten(got), tt.want) {
|
||||
t.Errorf("exclude() = %v, want %v", got, tt.want)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func Test_noneJoin(t *testing.T) {
|
||||
type args struct {
|
||||
seriesTagIndex1 map[uint64][]uint64
|
||||
seriesTagIndex2 map[uint64][]uint64
|
||||
}
|
||||
tests := []struct {
|
||||
name string
|
||||
args args
|
||||
want map[uint64][]uint64
|
||||
}{
|
||||
{
|
||||
name: "none join, direct splicing",
|
||||
args: args{
|
||||
seriesTagIndex1: seriesTagIndex1,
|
||||
seriesTagIndex2: seriesTagIndex2,
|
||||
},
|
||||
want: map[uint64][]uint64{
|
||||
0: {1, 2, 3, 4},
|
||||
1: {5, 6, 7, 8},
|
||||
2: {9, 10, 11, 12},
|
||||
3: {13, 14, 15, 16},
|
||||
},
|
||||
},
|
||||
}
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
if got := noneJoin(tt.args.seriesTagIndex1, tt.args.seriesTagIndex2); !allValueDeepEqual(got, tt.want) {
|
||||
t.Errorf("noneJoin() = %v, want %v", got, tt.want)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func Test_cartesianJoin(t *testing.T) {
|
||||
type args struct {
|
||||
seriesTagIndex1 map[uint64][]uint64
|
||||
seriesTagIndex2 map[uint64][]uint64
|
||||
}
|
||||
tests := []struct {
|
||||
name string
|
||||
args args
|
||||
want map[uint64][]uint64
|
||||
}{
|
||||
{
|
||||
name: "cartesian join",
|
||||
args: args{
|
||||
seriesTagIndex1: seriesTagIndex1,
|
||||
seriesTagIndex2: seriesTagIndex2,
|
||||
},
|
||||
want: map[uint64][]uint64{
|
||||
0: {1, 2, 3, 4, 9, 10, 11, 12},
|
||||
1: {5, 6, 7, 8, 9, 10, 11, 12},
|
||||
2: {5, 6, 7, 8, 13, 14, 15, 16},
|
||||
3: {1, 2, 3, 4, 13, 14, 15, 16},
|
||||
},
|
||||
},
|
||||
}
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
if got := cartesianJoin(tt.args.seriesTagIndex1, tt.args.seriesTagIndex2); !allValueDeepEqual(got, tt.want) {
|
||||
t.Errorf("cartesianJoin() = %v, want %v", got, tt.want)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func Test_onJoin(t *testing.T) {
|
||||
type args struct {
|
||||
reHashTagIndex1 map[uint64][][]uint64
|
||||
reHashTagIndex2 map[uint64][][]uint64
|
||||
joinType JoinType
|
||||
}
|
||||
tests := []struct {
|
||||
name string
|
||||
args args
|
||||
want map[uint64][]uint64
|
||||
}{
|
||||
{
|
||||
name: "left join",
|
||||
args: args{
|
||||
reHashTagIndex1: reHashTagIndex1,
|
||||
reHashTagIndex2: reHashTagIndex2,
|
||||
joinType: Left,
|
||||
},
|
||||
want: map[uint64][]uint64{
|
||||
1: {1, 2, 9, 10},
|
||||
2: {3, 4, 9, 10},
|
||||
3: {1, 2, 11, 12},
|
||||
4: {3, 4, 11, 12},
|
||||
5: {5, 6},
|
||||
6: {7, 8},
|
||||
},
|
||||
},
|
||||
{
|
||||
name: "right join",
|
||||
args: args{
|
||||
reHashTagIndex1: reHashTagIndex2,
|
||||
reHashTagIndex2: reHashTagIndex1,
|
||||
joinType: Right,
|
||||
},
|
||||
want: map[uint64][]uint64{
|
||||
1: {1, 2, 9, 10},
|
||||
2: {3, 4, 9, 10},
|
||||
3: {1, 2, 11, 12},
|
||||
4: {3, 4, 11, 12},
|
||||
5: {13, 14},
|
||||
6: {15, 16},
|
||||
},
|
||||
},
|
||||
|
||||
{
|
||||
name: "inner join",
|
||||
args: args{
|
||||
reHashTagIndex1: reHashTagIndex1,
|
||||
reHashTagIndex2: reHashTagIndex2,
|
||||
joinType: Inner,
|
||||
},
|
||||
want: map[uint64][]uint64{
|
||||
1: {1, 2, 9, 10},
|
||||
2: {3, 4, 9, 10},
|
||||
3: {1, 2, 11, 12},
|
||||
4: {3, 4, 11, 12},
|
||||
},
|
||||
},
|
||||
}
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
if got := onJoin(tt.args.reHashTagIndex1, tt.args.reHashTagIndex2, tt.args.joinType); !allValueDeepEqual(flatten(got), tt.want) {
|
||||
t.Errorf("onJoin() = %v, want %v", got, tt.want)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
// allValueDeepEqual 判断 map 的 value 是否相同,不考虑 key
|
||||
func allValueDeepEqual(got, want map[uint64][]uint64) bool {
|
||||
if len(got) != len(want) {
|
||||
return false
|
||||
}
|
||||
for _, v1 := range got {
|
||||
curEqual := false
|
||||
slices.Sort(v1)
|
||||
for _, v2 := range want {
|
||||
slices.Sort(v2)
|
||||
if reflect.DeepEqual(v1, v2) {
|
||||
curEqual = true
|
||||
break
|
||||
}
|
||||
}
|
||||
if !curEqual {
|
||||
return false
|
||||
}
|
||||
}
|
||||
return true
|
||||
}
|
||||
@@ -53,11 +53,10 @@ type Processor struct {
|
||||
datasourceId int64
|
||||
EngineName string
|
||||
|
||||
rule *models.AlertRule
|
||||
fires *AlertCurEventMap
|
||||
pendings *AlertCurEventMap
|
||||
pendingsUseByRecover *AlertCurEventMap
|
||||
inhibit bool
|
||||
rule *models.AlertRule
|
||||
fires *AlertCurEventMap
|
||||
pendings *AlertCurEventMap
|
||||
inhibit bool
|
||||
|
||||
tagsMap map[string]string
|
||||
tagsArr []string
|
||||
@@ -233,17 +232,17 @@ func Relabel(rule *models.AlertRule, event *models.AlertCurEvent) {
|
||||
return
|
||||
}
|
||||
|
||||
if len(rule.EventRelabelConfig) == 0 {
|
||||
return
|
||||
}
|
||||
|
||||
// need to keep the original label
|
||||
event.OriginalTags = event.Tags
|
||||
event.OriginalTagsJSON = make([]string, len(event.TagsJSON))
|
||||
|
||||
labels := make([]prompb.Label, len(event.TagsJSON))
|
||||
for i, tag := range event.TagsJSON {
|
||||
label := strings.SplitN(tag, "=", 2)
|
||||
label := strings.Split(tag, "=")
|
||||
if len(label) != 2 {
|
||||
logger.Errorf("event%+v relabel: the label length is not 2:%v", event, label)
|
||||
continue
|
||||
}
|
||||
event.OriginalTagsJSON[i] = tag
|
||||
labels[i] = prompb.Label{Name: label[0], Value: label[1]}
|
||||
}
|
||||
@@ -343,22 +342,11 @@ func (p *Processor) RecoverSingle(hash string, now int64, value *string, values
|
||||
if !has {
|
||||
return
|
||||
}
|
||||
|
||||
// 如果配置了留观时长,就不能立马恢复了
|
||||
if cachedRule.RecoverDuration > 0 {
|
||||
lastPendingEvent, has := p.pendingsUseByRecover.Get(hash)
|
||||
if !has {
|
||||
// 说明没有产生过异常点,就不需要恢复了
|
||||
logger.Debugf("rule_eval:%s event:%v do not has pending event, not recover", p.Key(), event)
|
||||
return
|
||||
}
|
||||
|
||||
if now-lastPendingEvent.LastEvalTime < cachedRule.RecoverDuration {
|
||||
logger.Debugf("rule_eval:%s event:%v not recover", p.Key(), event)
|
||||
return
|
||||
}
|
||||
if cachedRule.RecoverDuration > 0 && now-event.LastEvalTime < cachedRule.RecoverDuration {
|
||||
logger.Debugf("rule_eval:%s event:%v not recover", p.Key(), event)
|
||||
return
|
||||
}
|
||||
|
||||
if value != nil {
|
||||
event.TriggerValue = *value
|
||||
if len(values) > 0 {
|
||||
@@ -370,7 +358,6 @@ func (p *Processor) RecoverSingle(hash string, now int64, value *string, values
|
||||
// 我确实无法分辨,是prom中有值但是未满足阈值所以没返回,还是prom中确实丢了一些点导致没有数据可以返回,尴尬
|
||||
p.fires.Delete(hash)
|
||||
p.pendings.Delete(hash)
|
||||
p.pendingsUseByRecover.Delete(hash)
|
||||
|
||||
// 可能是因为调整了promql才恢复的,所以事件里边要体现最新的promql,否则用户会比较困惑
|
||||
// 当然,其实rule的各个字段都可能发生变化了,都更新一下吧
|
||||
@@ -390,13 +377,6 @@ func (p *Processor) handleEvent(events []*models.AlertCurEvent) {
|
||||
if event == nil {
|
||||
continue
|
||||
}
|
||||
|
||||
if _, has := p.pendingsUseByRecover.Get(event.Hash); has {
|
||||
p.pendingsUseByRecover.UpdateLastEvalTime(event.Hash, event.LastEvalTime)
|
||||
} else {
|
||||
p.pendingsUseByRecover.Set(event.Hash, event)
|
||||
}
|
||||
|
||||
if p.rule.PromForDuration == 0 {
|
||||
fireEvents = append(fireEvents, event)
|
||||
if severity > event.Severity {
|
||||
@@ -405,7 +385,7 @@ func (p *Processor) handleEvent(events []*models.AlertCurEvent) {
|
||||
continue
|
||||
}
|
||||
|
||||
var preTriggerTime int64 // 第一个 pending event 的触发时间
|
||||
var preTriggerTime int64
|
||||
preEvent, has := p.pendings.Get(event.Hash)
|
||||
if has {
|
||||
p.pendings.UpdateLastEvalTime(event.Hash, event.LastEvalTime)
|
||||
@@ -507,7 +487,6 @@ func (p *Processor) RecoverAlertCurEventFromDb() {
|
||||
}
|
||||
|
||||
fireMap := make(map[string]*models.AlertCurEvent)
|
||||
pendingsUseByRecoverMap := make(map[string]*models.AlertCurEvent)
|
||||
for _, event := range curEvents {
|
||||
if event.Cate == models.HOST {
|
||||
target, exists := p.TargetCache.Get(event.TargetIdent)
|
||||
@@ -519,14 +498,9 @@ func (p *Processor) RecoverAlertCurEventFromDb() {
|
||||
|
||||
event.DB2Mem()
|
||||
fireMap[event.Hash] = event
|
||||
e := *event
|
||||
pendingsUseByRecoverMap[event.Hash] = &e
|
||||
}
|
||||
|
||||
p.fires = NewAlertCurEventMap(fireMap)
|
||||
|
||||
// 修改告警规则,或者进程重启之后,需要重新加载 pendingsUseByRecover
|
||||
p.pendingsUseByRecover = NewAlertCurEventMap(pendingsUseByRecoverMap)
|
||||
}
|
||||
|
||||
func (p *Processor) fillTags(anomalyPoint common.AnomalyPoint) {
|
||||
@@ -602,7 +576,6 @@ func (p *Processor) mayHandleGroup() {
|
||||
func (p *Processor) DeleteProcessEvent(hash string) {
|
||||
p.fires.Delete(hash)
|
||||
p.pendings.Delete(hash)
|
||||
p.pendingsUseByRecover.Delete(hash)
|
||||
}
|
||||
|
||||
func labelMapToArr(m map[string]string) []string {
|
||||
|
||||
@@ -125,7 +125,7 @@ func (c *DefaultCallBacker) CallBack(ctx CallBackContext) {
|
||||
Batch: 1000,
|
||||
}
|
||||
|
||||
PushCallbackEvent(ctx.Ctx, webhookConf, event, ctx.Stats)
|
||||
PushCallbackEvent(webhookConf, event, ctx.Stats)
|
||||
return
|
||||
}
|
||||
|
||||
@@ -141,46 +141,16 @@ func (c *DefaultCallBacker) CallBack(ctx CallBackContext) {
|
||||
}
|
||||
}
|
||||
|
||||
func doSendAndRecord(ctx *ctx.Context, url, token string, body interface{}, channel string,
|
||||
stats *astats.Stats, event *models.AlertCurEvent) {
|
||||
res, err := doSend(url, body, channel, stats)
|
||||
NotifyRecord(ctx, event, channel, token, res, err)
|
||||
}
|
||||
|
||||
func NotifyRecord(ctx *ctx.Context, evt *models.AlertCurEvent, channel, target, res string, err error) {
|
||||
noti := models.NewNotificationRecord(evt, channel, target)
|
||||
if err != nil {
|
||||
noti.SetStatus(models.NotiStatusFailure)
|
||||
noti.SetDetails(err.Error())
|
||||
} else if res != "" {
|
||||
noti.SetDetails(string(res))
|
||||
}
|
||||
|
||||
if !ctx.IsCenter {
|
||||
_, err := poster.PostByUrlsWithResp[int64](ctx, "/v1/n9e/notify-record", noti)
|
||||
if err != nil {
|
||||
logger.Errorf("add noti:%v failed, err: %v", noti, err)
|
||||
}
|
||||
return
|
||||
}
|
||||
|
||||
if err := noti.Add(ctx); err != nil {
|
||||
logger.Errorf("add noti:%v failed, err: %v", noti, err)
|
||||
}
|
||||
}
|
||||
|
||||
func doSend(url string, body interface{}, channel string, stats *astats.Stats) (string, error) {
|
||||
func doSend(url string, body interface{}, channel string, stats *astats.Stats) {
|
||||
stats.AlertNotifyTotal.WithLabelValues(channel).Inc()
|
||||
|
||||
res, code, err := poster.PostJSON(url, time.Second*5, body, 3)
|
||||
if err != nil {
|
||||
logger.Errorf("%s_sender: result=fail url=%s code=%d error=%v req:%v response=%s", channel, url, code, err, body, string(res))
|
||||
stats.AlertNotifyErrorTotal.WithLabelValues(channel).Inc()
|
||||
return "", err
|
||||
} else {
|
||||
logger.Infof("%s_sender: result=succ url=%s code=%d req:%v response=%s", channel, url, code, body, string(res))
|
||||
}
|
||||
|
||||
logger.Infof("%s_sender: result=succ url=%s code=%d req:%v response=%s", channel, url, code, body, string(res))
|
||||
return string(res), nil
|
||||
}
|
||||
|
||||
type TaskCreateReply struct {
|
||||
@@ -188,7 +158,7 @@ type TaskCreateReply struct {
|
||||
Dat int64 `json:"dat"` // task.id
|
||||
}
|
||||
|
||||
func PushCallbackEvent(ctx *ctx.Context, webhook *models.Webhook, event *models.AlertCurEvent, stats *astats.Stats) {
|
||||
func PushCallbackEvent(webhook *models.Webhook, event *models.AlertCurEvent, stats *astats.Stats) {
|
||||
CallbackEventQueueLock.RLock()
|
||||
queue := CallbackEventQueue[webhook.Url]
|
||||
CallbackEventQueueLock.RUnlock()
|
||||
@@ -203,7 +173,7 @@ func PushCallbackEvent(ctx *ctx.Context, webhook *models.Webhook, event *models.
|
||||
CallbackEventQueue[webhook.Url] = queue
|
||||
CallbackEventQueueLock.Unlock()
|
||||
|
||||
StartConsumer(ctx, queue, webhook.Batch, webhook, stats)
|
||||
StartConsumer(queue, webhook.Batch, webhook, stats)
|
||||
}
|
||||
|
||||
succ := queue.list.PushFront(event)
|
||||
|
||||
@@ -1,10 +1,9 @@
|
||||
package sender
|
||||
|
||||
import (
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"html/template"
|
||||
"strings"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
)
|
||||
|
||||
type dingtalkMarkdown struct {
|
||||
@@ -36,13 +35,13 @@ func (ds *DingtalkSender) Send(ctx MessageContext) {
|
||||
return
|
||||
}
|
||||
|
||||
urls, ats, tokens := ds.extract(ctx.Users)
|
||||
urls, ats := ds.extract(ctx.Users)
|
||||
if len(urls) == 0 {
|
||||
return
|
||||
}
|
||||
message := BuildTplMessage(models.Dingtalk, ds.tpl, ctx.Events)
|
||||
|
||||
for i, url := range urls {
|
||||
for _, url := range urls {
|
||||
var body dingtalk
|
||||
// NoAt in url
|
||||
if strings.Contains(url, "noat=1") {
|
||||
@@ -67,7 +66,7 @@ func (ds *DingtalkSender) Send(ctx MessageContext) {
|
||||
}
|
||||
}
|
||||
|
||||
doSendAndRecord(ctx.Ctx, url, tokens[i], body, models.Dingtalk, ctx.Stats, ctx.Events[0])
|
||||
doSend(url, body, models.Dingtalk, ctx.Stats)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -97,17 +96,15 @@ func (ds *DingtalkSender) CallBack(ctx CallBackContext) {
|
||||
body.Markdown.Text = message
|
||||
}
|
||||
|
||||
doSendAndRecord(ctx.Ctx, ctx.CallBackURL, ctx.CallBackURL, body,
|
||||
"callback", ctx.Stats, ctx.Events[0])
|
||||
doSend(ctx.CallBackURL, body, models.Dingtalk, ctx.Stats)
|
||||
|
||||
ctx.Stats.AlertNotifyTotal.WithLabelValues("rule_callback").Inc()
|
||||
}
|
||||
|
||||
// extract urls and ats from Users
|
||||
func (ds *DingtalkSender) extract(users []*models.User) ([]string, []string, []string) {
|
||||
func (ds *DingtalkSender) extract(users []*models.User) ([]string, []string) {
|
||||
urls := make([]string, 0, len(users))
|
||||
ats := make([]string, 0, len(users))
|
||||
tokens := make([]string, 0, len(users))
|
||||
|
||||
for _, user := range users {
|
||||
if user.Phone != "" {
|
||||
@@ -119,8 +116,7 @@ func (ds *DingtalkSender) extract(users []*models.User) ([]string, []string, []s
|
||||
url = "https://oapi.dingtalk.com/robot/send?access_token=" + token
|
||||
}
|
||||
urls = append(urls, url)
|
||||
tokens = append(tokens, token)
|
||||
}
|
||||
}
|
||||
return urls, ats, tokens
|
||||
return urls, ats
|
||||
}
|
||||
|
||||
@@ -9,14 +9,13 @@ import (
|
||||
"github.com/ccfos/nightingale/v6/alert/aconf"
|
||||
"github.com/ccfos/nightingale/v6/memsto"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
|
||||
"github.com/toolkits/pkg/logger"
|
||||
|
||||
"gopkg.in/gomail.v2"
|
||||
)
|
||||
|
||||
var mailch chan *EmailContext
|
||||
var mailch chan *gomail.Message
|
||||
|
||||
type EmailSender struct {
|
||||
subjectTpl *template.Template
|
||||
@@ -24,11 +23,6 @@ type EmailSender struct {
|
||||
smtp aconf.SMTPConfig
|
||||
}
|
||||
|
||||
type EmailContext struct {
|
||||
event *models.AlertCurEvent
|
||||
mail *gomail.Message
|
||||
}
|
||||
|
||||
func (es *EmailSender) Send(ctx MessageContext) {
|
||||
if len(ctx.Users) == 0 || len(ctx.Events) == 0 {
|
||||
return
|
||||
@@ -42,7 +36,7 @@ func (es *EmailSender) Send(ctx MessageContext) {
|
||||
subject = ctx.Events[0].RuleName
|
||||
}
|
||||
content := BuildTplMessage(models.Email, es.contentTpl, ctx.Events)
|
||||
es.WriteEmail(subject, content, tos, ctx.Events[0])
|
||||
es.WriteEmail(subject, content, tos)
|
||||
|
||||
ctx.Stats.AlertNotifyTotal.WithLabelValues(models.Email).Add(float64(len(tos)))
|
||||
}
|
||||
@@ -79,8 +73,7 @@ func SendEmail(subject, content string, tos []string, stmp aconf.SMTPConfig) err
|
||||
return nil
|
||||
}
|
||||
|
||||
func (es *EmailSender) WriteEmail(subject, content string, tos []string,
|
||||
event *models.AlertCurEvent) {
|
||||
func (es *EmailSender) WriteEmail(subject, content string, tos []string) {
|
||||
m := gomail.NewMessage()
|
||||
|
||||
m.SetHeader("From", es.smtp.From)
|
||||
@@ -88,7 +81,7 @@ func (es *EmailSender) WriteEmail(subject, content string, tos []string,
|
||||
m.SetHeader("Subject", subject)
|
||||
m.SetBody("text/html", content)
|
||||
|
||||
mailch <- &EmailContext{event, m}
|
||||
mailch <- m
|
||||
}
|
||||
|
||||
func dialSmtp(d *gomail.Dialer) gomail.SendCloser {
|
||||
@@ -111,22 +104,22 @@ func dialSmtp(d *gomail.Dialer) gomail.SendCloser {
|
||||
|
||||
var mailQuit = make(chan struct{})
|
||||
|
||||
func RestartEmailSender(ctx *ctx.Context, smtp aconf.SMTPConfig) {
|
||||
func RestartEmailSender(smtp aconf.SMTPConfig) {
|
||||
// Notify internal start exit
|
||||
mailQuit <- struct{}{}
|
||||
startEmailSender(ctx, smtp)
|
||||
startEmailSender(smtp)
|
||||
}
|
||||
|
||||
var smtpConfig aconf.SMTPConfig
|
||||
|
||||
func InitEmailSender(ctx *ctx.Context, ncc *memsto.NotifyConfigCacheType) {
|
||||
mailch = make(chan *EmailContext, 100000)
|
||||
go updateSmtp(ctx, ncc)
|
||||
func InitEmailSender(ncc *memsto.NotifyConfigCacheType) {
|
||||
mailch = make(chan *gomail.Message, 100000)
|
||||
go updateSmtp(ncc)
|
||||
smtpConfig = ncc.GetSMTP()
|
||||
startEmailSender(ctx, smtpConfig)
|
||||
startEmailSender(smtpConfig)
|
||||
}
|
||||
|
||||
func updateSmtp(ctx *ctx.Context, ncc *memsto.NotifyConfigCacheType) {
|
||||
func updateSmtp(ncc *memsto.NotifyConfigCacheType) {
|
||||
for {
|
||||
time.Sleep(1 * time.Minute)
|
||||
smtp := ncc.GetSMTP()
|
||||
@@ -134,12 +127,12 @@ func updateSmtp(ctx *ctx.Context, ncc *memsto.NotifyConfigCacheType) {
|
||||
smtpConfig.Pass != smtp.Pass || smtpConfig.User != smtp.User || smtpConfig.Port != smtp.Port ||
|
||||
smtpConfig.InsecureSkipVerify != smtp.InsecureSkipVerify { //diff
|
||||
smtpConfig = smtp
|
||||
RestartEmailSender(ctx, smtp)
|
||||
RestartEmailSender(smtp)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func startEmailSender(ctx *ctx.Context, smtp aconf.SMTPConfig) {
|
||||
func startEmailSender(smtp aconf.SMTPConfig) {
|
||||
conf := smtp
|
||||
if conf.Host == "" || conf.Port == 0 {
|
||||
logger.Warning("SMTP configurations invalid")
|
||||
@@ -174,8 +167,7 @@ func startEmailSender(ctx *ctx.Context, smtp aconf.SMTPConfig) {
|
||||
}
|
||||
open = true
|
||||
}
|
||||
var err error
|
||||
if err = gomail.Send(s, m.mail); err != nil {
|
||||
if err := gomail.Send(s, m); err != nil {
|
||||
logger.Errorf("email_sender: failed to send: %s", err)
|
||||
|
||||
// close and retry
|
||||
@@ -192,16 +184,11 @@ func startEmailSender(ctx *ctx.Context, smtp aconf.SMTPConfig) {
|
||||
}
|
||||
open = true
|
||||
|
||||
if err = gomail.Send(s, m.mail); err != nil {
|
||||
if err := gomail.Send(s, m); err != nil {
|
||||
logger.Errorf("email_sender: failed to retry send: %s", err)
|
||||
}
|
||||
} else {
|
||||
logger.Infof("email_sender: result=succ subject=%v to=%v",
|
||||
m.mail.GetHeader("Subject"), m.mail.GetHeader("To"))
|
||||
}
|
||||
|
||||
for _, to := range m.mail.GetHeader("To") {
|
||||
NotifyRecord(ctx, m.event, models.Email, to, "", err)
|
||||
logger.Infof("email_sender: result=succ subject=%v to=%v", m.GetHeader("Subject"), m.GetHeader("To"))
|
||||
}
|
||||
|
||||
size++
|
||||
|
||||
@@ -54,8 +54,7 @@ func (fs *FeishuSender) CallBack(ctx CallBackContext) {
|
||||
},
|
||||
}
|
||||
|
||||
doSendAndRecord(ctx.Ctx, ctx.CallBackURL, ctx.CallBackURL, body, "callback",
|
||||
ctx.Stats, ctx.Events[0])
|
||||
doSend(ctx.CallBackURL, body, models.Feishu, ctx.Stats)
|
||||
ctx.Stats.AlertNotifyTotal.WithLabelValues("rule_callback").Inc()
|
||||
}
|
||||
|
||||
@@ -63,9 +62,9 @@ func (fs *FeishuSender) Send(ctx MessageContext) {
|
||||
if len(ctx.Users) == 0 || len(ctx.Events) == 0 {
|
||||
return
|
||||
}
|
||||
urls, ats, tokens := fs.extract(ctx.Users)
|
||||
urls, ats := fs.extract(ctx.Users)
|
||||
message := BuildTplMessage(models.Feishu, fs.tpl, ctx.Events)
|
||||
for i, url := range urls {
|
||||
for _, url := range urls {
|
||||
body := feishu{
|
||||
Msgtype: "text",
|
||||
Content: feishuContent{
|
||||
@@ -78,14 +77,13 @@ func (fs *FeishuSender) Send(ctx MessageContext) {
|
||||
IsAtAll: false,
|
||||
}
|
||||
}
|
||||
doSendAndRecord(ctx.Ctx, url, tokens[i], body, models.Feishu, ctx.Stats, ctx.Events[0])
|
||||
doSend(url, body, models.Feishu, ctx.Stats)
|
||||
}
|
||||
}
|
||||
|
||||
func (fs *FeishuSender) extract(users []*models.User) ([]string, []string, []string) {
|
||||
func (fs *FeishuSender) extract(users []*models.User) ([]string, []string) {
|
||||
urls := make([]string, 0, len(users))
|
||||
ats := make([]string, 0, len(users))
|
||||
tokens := make([]string, 0, len(users))
|
||||
|
||||
for _, user := range users {
|
||||
if user.Phone != "" {
|
||||
@@ -97,8 +95,7 @@ func (fs *FeishuSender) extract(users []*models.User) ([]string, []string, []str
|
||||
url = "https://open.feishu.cn/open-apis/bot/v2/hook/" + token
|
||||
}
|
||||
urls = append(urls, url)
|
||||
tokens = append(tokens, token)
|
||||
}
|
||||
}
|
||||
return urls, ats, tokens
|
||||
return urls, ats
|
||||
}
|
||||
|
||||
@@ -134,15 +134,14 @@ func (fs *FeishuCardSender) CallBack(ctx CallBackContext) {
|
||||
}
|
||||
parsedURL.RawQuery = ""
|
||||
|
||||
doSendAndRecord(ctx.Ctx, parsedURL.String(), parsedURL.String(), body, "callback",
|
||||
ctx.Stats, ctx.Events[0])
|
||||
doSend(parsedURL.String(), body, models.FeishuCard, ctx.Stats)
|
||||
}
|
||||
|
||||
func (fs *FeishuCardSender) Send(ctx MessageContext) {
|
||||
if len(ctx.Users) == 0 || len(ctx.Events) == 0 {
|
||||
return
|
||||
}
|
||||
urls, tokens := fs.extract(ctx.Users)
|
||||
urls, _ := fs.extract(ctx.Users)
|
||||
message := BuildTplMessage(models.FeishuCard, fs.tpl, ctx.Events)
|
||||
color := "red"
|
||||
lowerUnicode := strings.ToLower(message)
|
||||
@@ -157,15 +156,14 @@ func (fs *FeishuCardSender) Send(ctx MessageContext) {
|
||||
body.Card.Header.Template = color
|
||||
body.Card.Elements[0].Text.Content = message
|
||||
body.Card.Elements[2].Elements[0].Content = SendTitle
|
||||
for i, url := range urls {
|
||||
doSendAndRecord(ctx.Ctx, url, tokens[i], body, models.FeishuCard,
|
||||
ctx.Stats, ctx.Events[0])
|
||||
for _, url := range urls {
|
||||
doSend(url, body, models.FeishuCard, ctx.Stats)
|
||||
}
|
||||
}
|
||||
|
||||
func (fs *FeishuCardSender) extract(users []*models.User) ([]string, []string) {
|
||||
urls := make([]string, 0, len(users))
|
||||
tokens := make([]string, 0, len(users))
|
||||
ats := make([]string, 0)
|
||||
for i := range users {
|
||||
if token, has := users[i].ExtractToken(models.FeishuCard); has {
|
||||
url := token
|
||||
@@ -173,8 +171,7 @@ func (fs *FeishuCardSender) extract(users []*models.User) ([]string, []string) {
|
||||
url = "https://open.feishu.cn/open-apis/bot/v2/hook/" + strings.TrimSpace(token)
|
||||
}
|
||||
urls = append(urls, url)
|
||||
tokens = append(tokens, token)
|
||||
}
|
||||
}
|
||||
return urls, tokens
|
||||
return urls, ats
|
||||
}
|
||||
|
||||
@@ -12,7 +12,8 @@ import (
|
||||
"github.com/ccfos/nightingale/v6/memsto"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
"github.com/ccfos/nightingale/v6/storage"
|
||||
imodels "github.com/flashcatcloud/ibex/src/models"
|
||||
"github.com/flashcatcloud/ibex/src/storage"
|
||||
|
||||
"github.com/toolkits/pkg/logger"
|
||||
)
|
||||
@@ -42,7 +43,7 @@ func (c *IbexCallBacker) CallBack(ctx CallBackContext) {
|
||||
}
|
||||
|
||||
func (c *IbexCallBacker) handleIbex(ctx *ctx.Context, url string, event *models.AlertCurEvent) {
|
||||
if models.DB(ctx) == nil && ctx.IsCenter {
|
||||
if imodels.DB() == nil && ctx.IsCenter {
|
||||
logger.Warning("event_callback_ibex: db is nil")
|
||||
return
|
||||
}
|
||||
@@ -76,20 +77,15 @@ func (c *IbexCallBacker) handleIbex(ctx *ctx.Context, url string, event *models.
|
||||
return
|
||||
}
|
||||
|
||||
CallIbex(ctx, id, host, c.taskTplCache, c.targetCache, c.userCache, event)
|
||||
}
|
||||
|
||||
func CallIbex(ctx *ctx.Context, id int64, host string,
|
||||
taskTplCache *memsto.TaskTplCache, targetCache *memsto.TargetCacheType,
|
||||
userCache *memsto.UserCacheType, event *models.AlertCurEvent) {
|
||||
tpl := taskTplCache.Get(id)
|
||||
tpl := c.taskTplCache.Get(id)
|
||||
if tpl == nil {
|
||||
logger.Errorf("event_callback_ibex: no such tpl(%d)", id)
|
||||
return
|
||||
}
|
||||
|
||||
// check perm
|
||||
// tpl.GroupId - host - account 三元组校验权限
|
||||
can, err := canDoIbex(tpl.UpdateBy, tpl, host, targetCache, userCache)
|
||||
can, err := canDoIbex(tpl.UpdateBy, tpl, host, c.targetCache, c.userCache)
|
||||
if err != nil {
|
||||
logger.Errorf("event_callback_ibex: check perm fail: %v", err)
|
||||
return
|
||||
@@ -141,7 +137,7 @@ func CallIbex(ctx *ctx.Context, id int64, host string,
|
||||
AlertTriggered: true,
|
||||
}
|
||||
|
||||
id, err = TaskAdd(ctx, in, tpl.UpdateBy, ctx.IsCenter)
|
||||
id, err = TaskAdd(in, tpl.UpdateBy, ctx.IsCenter)
|
||||
if err != nil {
|
||||
logger.Errorf("event_callback_ibex: call ibex fail: %v", err)
|
||||
return
|
||||
@@ -183,13 +179,13 @@ func canDoIbex(username string, tpl *models.TaskTpl, host string, targetCache *m
|
||||
return target.GroupId == tpl.GroupId, nil
|
||||
}
|
||||
|
||||
func TaskAdd(ctx *ctx.Context, f models.TaskForm, authUser string, isCenter bool) (int64, error) {
|
||||
func TaskAdd(f models.TaskForm, authUser string, isCenter bool) (int64, error) {
|
||||
hosts := cleanHosts(f.Hosts)
|
||||
if len(hosts) == 0 {
|
||||
return 0, fmt.Errorf("arg(hosts) empty")
|
||||
}
|
||||
|
||||
taskMeta := &models.TaskMeta{
|
||||
taskMeta := &imodels.TaskMeta{
|
||||
Title: f.Title,
|
||||
Account: f.Account,
|
||||
Batch: f.Batch,
|
||||
@@ -212,34 +208,34 @@ func TaskAdd(ctx *ctx.Context, f models.TaskForm, authUser string, isCenter bool
|
||||
// 任务类型分为"告警规则触发"和"n9e center用户下发"两种;
|
||||
// 边缘机房"告警规则触发"的任务不需要规划,并且它可能是失联的,无法使用db资源,所以放入redis缓存中,直接下发给agentd执行
|
||||
if !isCenter && f.AlertTriggered {
|
||||
if err := taskMeta.Create(ctx); err != nil {
|
||||
if err := taskMeta.Create(); err != nil {
|
||||
// 当网络不连通时,生成唯一的id,防止边缘机房中不同任务的id相同;
|
||||
// 方法是,redis自增id去防止同一个机房的不同n9e edge生成的id相同;
|
||||
// 但没法防止不同边缘机房生成同样的id,所以,生成id的数据不会上报存入数据库,只用于闭环执行。
|
||||
taskMeta.Id, err = storage.IdGet(ctx.Redis)
|
||||
taskMeta.Id, err = storage.IdGet()
|
||||
if err != nil {
|
||||
return 0, err
|
||||
}
|
||||
}
|
||||
|
||||
taskHost := models.TaskHost{
|
||||
taskHost := imodels.TaskHost{
|
||||
Id: taskMeta.Id,
|
||||
Host: hosts[0],
|
||||
Status: "running",
|
||||
}
|
||||
if err = taskHost.Create(ctx); err != nil {
|
||||
if err = taskHost.Create(); err != nil {
|
||||
logger.Warningf("task_add_fail: authUser=%s title=%s err=%s", authUser, taskMeta.Title, err.Error())
|
||||
}
|
||||
|
||||
// 缓存任务元信息和待下发的任务
|
||||
err = taskMeta.Cache(ctx, hosts[0])
|
||||
err = taskMeta.Cache(hosts[0])
|
||||
if err != nil {
|
||||
return 0, err
|
||||
}
|
||||
|
||||
} else {
|
||||
// 如果是中心机房,还是保持之前的逻辑
|
||||
err = taskMeta.Save(ctx, hosts, f.Action)
|
||||
err = taskMeta.Save(hosts, f.Action)
|
||||
if err != nil {
|
||||
return 0, err
|
||||
}
|
||||
|
||||
@@ -27,8 +27,7 @@ func (lk *LarkSender) CallBack(ctx CallBackContext) {
|
||||
},
|
||||
}
|
||||
|
||||
doSendAndRecord(ctx.Ctx, ctx.CallBackURL, ctx.CallBackURL, body, "callback",
|
||||
ctx.Stats, ctx.Events[0])
|
||||
doSend(ctx.CallBackURL, body, models.Lark, ctx.Stats)
|
||||
ctx.Stats.AlertNotifyTotal.WithLabelValues("rule_callback").Inc()
|
||||
}
|
||||
|
||||
|
||||
@@ -55,8 +55,7 @@ func (fs *LarkCardSender) CallBack(ctx CallBackContext) {
|
||||
}
|
||||
parsedURL.RawQuery = ""
|
||||
|
||||
doSendAndRecord(ctx.Ctx, ctx.CallBackURL, ctx.CallBackURL, body, "callback",
|
||||
ctx.Stats, ctx.Events[0])
|
||||
doSend(parsedURL.String(), body, models.LarkCard, ctx.Stats)
|
||||
}
|
||||
|
||||
func (fs *LarkCardSender) Send(ctx MessageContext) {
|
||||
|
||||
@@ -7,7 +7,6 @@ import (
|
||||
|
||||
"github.com/ccfos/nightingale/v6/alert/astats"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
|
||||
"github.com/toolkits/pkg/logger"
|
||||
)
|
||||
@@ -39,11 +38,11 @@ func (ms *MmSender) Send(ctx MessageContext) {
|
||||
}
|
||||
message := BuildTplMessage(models.Mm, ms.tpl, ctx.Events)
|
||||
|
||||
SendMM(ctx.Ctx, MatterMostMessage{
|
||||
SendMM(MatterMostMessage{
|
||||
Text: message,
|
||||
Tokens: urls,
|
||||
Stats: ctx.Stats,
|
||||
}, ctx.Events[0])
|
||||
})
|
||||
}
|
||||
|
||||
func (ms *MmSender) CallBack(ctx CallBackContext) {
|
||||
@@ -52,11 +51,11 @@ func (ms *MmSender) CallBack(ctx CallBackContext) {
|
||||
}
|
||||
message := BuildTplMessage(models.Mm, ms.tpl, ctx.Events)
|
||||
|
||||
SendMM(ctx.Ctx, MatterMostMessage{
|
||||
SendMM(MatterMostMessage{
|
||||
Text: message,
|
||||
Tokens: []string{ctx.CallBackURL},
|
||||
Stats: ctx.Stats,
|
||||
}, ctx.Events[0])
|
||||
})
|
||||
|
||||
ctx.Stats.AlertNotifyTotal.WithLabelValues("rule_callback").Inc()
|
||||
}
|
||||
@@ -71,7 +70,7 @@ func (ms *MmSender) extract(users []*models.User) []string {
|
||||
return tokens
|
||||
}
|
||||
|
||||
func SendMM(ctx *ctx.Context, message MatterMostMessage, event *models.AlertCurEvent) {
|
||||
func SendMM(message MatterMostMessage) {
|
||||
for i := 0; i < len(message.Tokens); i++ {
|
||||
u, err := url.Parse(message.Tokens[i])
|
||||
if err != nil {
|
||||
@@ -104,7 +103,7 @@ func SendMM(ctx *ctx.Context, message MatterMostMessage, event *models.AlertCurE
|
||||
Username: username,
|
||||
Text: txt + message.Text,
|
||||
}
|
||||
doSendAndRecord(ctx, ur, message.Tokens[i], body, models.Mm, message.Stats, event)
|
||||
doSend(ur, body, models.Mm, message.Stats)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -2,30 +2,26 @@ package sender
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"fmt"
|
||||
"os"
|
||||
"os/exec"
|
||||
"time"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/alert/astats"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
|
||||
"github.com/toolkits/pkg/file"
|
||||
"github.com/toolkits/pkg/logger"
|
||||
"github.com/toolkits/pkg/sys"
|
||||
)
|
||||
|
||||
func MayPluginNotify(ctx *ctx.Context, noticeBytes []byte, notifyScript models.NotifyScript,
|
||||
stats *astats.Stats, event *models.AlertCurEvent) {
|
||||
func MayPluginNotify(noticeBytes []byte, notifyScript models.NotifyScript, stats *astats.Stats) {
|
||||
if len(noticeBytes) == 0 {
|
||||
return
|
||||
}
|
||||
alertingCallScript(ctx, noticeBytes, notifyScript, stats, event)
|
||||
alertingCallScript(noticeBytes, notifyScript, stats)
|
||||
}
|
||||
|
||||
func alertingCallScript(ctx *ctx.Context, stdinBytes []byte, notifyScript models.NotifyScript,
|
||||
stats *astats.Stats, event *models.AlertCurEvent) {
|
||||
func alertingCallScript(stdinBytes []byte, notifyScript models.NotifyScript, stats *astats.Stats) {
|
||||
// not enable or no notify.py? do nothing
|
||||
config := notifyScript
|
||||
if !config.Enable || config.Content == "" {
|
||||
@@ -85,7 +81,6 @@ func alertingCallScript(ctx *ctx.Context, stdinBytes []byte, notifyScript models
|
||||
}
|
||||
|
||||
err, isTimeout := sys.WrapTimeout(cmd, time.Duration(config.Timeout)*time.Second)
|
||||
NotifyRecord(ctx, event, channel, cmd.String(), "", buildErr(err, isTimeout))
|
||||
|
||||
if isTimeout {
|
||||
if err == nil {
|
||||
@@ -107,11 +102,3 @@ func alertingCallScript(ctx *ctx.Context, stdinBytes []byte, notifyScript models
|
||||
|
||||
logger.Infof("event_script_notify_ok: exec %s output: %s", fpath, buf.String())
|
||||
}
|
||||
|
||||
func buildErr(err error, isTimeout bool) error {
|
||||
if err == nil && !isTimeout {
|
||||
return nil
|
||||
} else {
|
||||
return fmt.Errorf("is_timeout: %v, err: %v", isTimeout, err)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -8,7 +8,6 @@ import (
|
||||
"github.com/ccfos/nightingale/v6/alert/astats"
|
||||
"github.com/ccfos/nightingale/v6/memsto"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
)
|
||||
|
||||
type (
|
||||
@@ -23,7 +22,6 @@ type (
|
||||
Rule *models.AlertRule
|
||||
Events []*models.AlertCurEvent
|
||||
Stats *astats.Stats
|
||||
Ctx *ctx.Context
|
||||
}
|
||||
)
|
||||
|
||||
@@ -51,15 +49,13 @@ func NewSender(key string, tpls map[string]*template.Template, smtp ...aconf.SMT
|
||||
return nil
|
||||
}
|
||||
|
||||
func BuildMessageContext(ctx *ctx.Context, rule *models.AlertRule, events []*models.AlertCurEvent,
|
||||
uids []int64, userCache *memsto.UserCacheType, stats *astats.Stats) MessageContext {
|
||||
func BuildMessageContext(rule *models.AlertRule, events []*models.AlertCurEvent, uids []int64, userCache *memsto.UserCacheType, stats *astats.Stats) MessageContext {
|
||||
users := userCache.GetByUserIds(uids)
|
||||
return MessageContext{
|
||||
Rule: rule,
|
||||
Events: events,
|
||||
Users: users,
|
||||
Stats: stats,
|
||||
Ctx: ctx,
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -6,7 +6,6 @@ import (
|
||||
|
||||
"github.com/ccfos/nightingale/v6/alert/astats"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
|
||||
"github.com/toolkits/pkg/logger"
|
||||
)
|
||||
@@ -36,11 +35,11 @@ func (ts *TelegramSender) CallBack(ctx CallBackContext) {
|
||||
}
|
||||
|
||||
message := BuildTplMessage(models.Telegram, ts.tpl, ctx.Events)
|
||||
SendTelegram(ctx.Ctx, TelegramMessage{
|
||||
SendTelegram(TelegramMessage{
|
||||
Text: message,
|
||||
Tokens: []string{ctx.CallBackURL},
|
||||
Stats: ctx.Stats,
|
||||
}, ctx.Events[0])
|
||||
})
|
||||
|
||||
ctx.Stats.AlertNotifyTotal.WithLabelValues("rule_callback").Inc()
|
||||
}
|
||||
@@ -52,11 +51,11 @@ func (ts *TelegramSender) Send(ctx MessageContext) {
|
||||
tokens := ts.extract(ctx.Users)
|
||||
message := BuildTplMessage(models.Telegram, ts.tpl, ctx.Events)
|
||||
|
||||
SendTelegram(ctx.Ctx, TelegramMessage{
|
||||
SendTelegram(TelegramMessage{
|
||||
Text: message,
|
||||
Tokens: tokens,
|
||||
Stats: ctx.Stats,
|
||||
}, ctx.Events[0])
|
||||
})
|
||||
}
|
||||
|
||||
func (ts *TelegramSender) extract(users []*models.User) []string {
|
||||
@@ -69,7 +68,7 @@ func (ts *TelegramSender) extract(users []*models.User) []string {
|
||||
return tokens
|
||||
}
|
||||
|
||||
func SendTelegram(ctx *ctx.Context, message TelegramMessage, event *models.AlertCurEvent) {
|
||||
func SendTelegram(message TelegramMessage) {
|
||||
for i := 0; i < len(message.Tokens); i++ {
|
||||
if !strings.Contains(message.Tokens[i], "/") && !strings.HasPrefix(message.Tokens[i], "https://") {
|
||||
logger.Errorf("telegram_sender: result=fail invalid token=%s", message.Tokens[i])
|
||||
@@ -93,6 +92,6 @@ func SendTelegram(ctx *ctx.Context, message TelegramMessage, event *models.Alert
|
||||
Text: message.Text,
|
||||
}
|
||||
|
||||
doSendAndRecord(ctx, url, message.Tokens[i], body, models.Telegram, message.Stats, event)
|
||||
doSend(url, body, models.Telegram, message.Stats)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -4,7 +4,6 @@ import (
|
||||
"bytes"
|
||||
"crypto/tls"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"io"
|
||||
"net/http"
|
||||
"sync"
|
||||
@@ -12,12 +11,11 @@ import (
|
||||
|
||||
"github.com/ccfos/nightingale/v6/alert/astats"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
|
||||
"github.com/toolkits/pkg/logger"
|
||||
)
|
||||
|
||||
func sendWebhook(webhook *models.Webhook, event interface{}, stats *astats.Stats) (bool, string, error) {
|
||||
func sendWebhook(webhook *models.Webhook, event interface{}, stats *astats.Stats) bool {
|
||||
channel := "webhook"
|
||||
if webhook.Type == models.RuleCallback {
|
||||
channel = "callback"
|
||||
@@ -25,12 +23,12 @@ func sendWebhook(webhook *models.Webhook, event interface{}, stats *astats.Stats
|
||||
|
||||
conf := webhook
|
||||
if conf.Url == "" || !conf.Enable {
|
||||
return false, "", nil
|
||||
return false
|
||||
}
|
||||
bs, err := json.Marshal(event)
|
||||
if err != nil {
|
||||
logger.Errorf("%s alertingWebhook failed to marshal event:%+v err:%v", channel, event, err)
|
||||
return false, "", err
|
||||
return false
|
||||
}
|
||||
|
||||
bf := bytes.NewBuffer(bs)
|
||||
@@ -38,7 +36,7 @@ func sendWebhook(webhook *models.Webhook, event interface{}, stats *astats.Stats
|
||||
req, err := http.NewRequest("POST", conf.Url, bf)
|
||||
if err != nil {
|
||||
logger.Warningf("%s alertingWebhook failed to new reques event:%s err:%v", channel, string(bs), err)
|
||||
return true, "", err
|
||||
return true
|
||||
}
|
||||
|
||||
req.Header.Set("Content-Type", "application/json")
|
||||
@@ -68,15 +66,14 @@ func sendWebhook(webhook *models.Webhook, event interface{}, stats *astats.Stats
|
||||
|
||||
stats.AlertNotifyTotal.WithLabelValues(channel).Inc()
|
||||
var resp *http.Response
|
||||
var body []byte
|
||||
resp, err = client.Do(req)
|
||||
|
||||
if err != nil {
|
||||
stats.AlertNotifyErrorTotal.WithLabelValues(channel).Inc()
|
||||
logger.Errorf("event_%s_fail, event:%s, url: [%s], error: [%s]", channel, string(bs), conf.Url, err)
|
||||
return true, "", err
|
||||
return true
|
||||
}
|
||||
|
||||
var body []byte
|
||||
if resp.Body != nil {
|
||||
defer resp.Body.Close()
|
||||
body, _ = io.ReadAll(resp.Body)
|
||||
@@ -84,19 +81,18 @@ func sendWebhook(webhook *models.Webhook, event interface{}, stats *astats.Stats
|
||||
|
||||
if resp.StatusCode == 429 {
|
||||
logger.Errorf("event_%s_fail, url: %s, response code: %d, body: %s event:%s", channel, conf.Url, resp.StatusCode, string(body), string(bs))
|
||||
return true, string(body), fmt.Errorf("status code is 429")
|
||||
return true
|
||||
}
|
||||
|
||||
logger.Debugf("event_%s_succ, url: %s, response code: %d, body: %s event:%s", channel, conf.Url, resp.StatusCode, string(body), string(bs))
|
||||
return false, string(body), nil
|
||||
return false
|
||||
}
|
||||
|
||||
func SingleSendWebhooks(ctx *ctx.Context, webhooks []*models.Webhook, event *models.AlertCurEvent, stats *astats.Stats) {
|
||||
func SingleSendWebhooks(webhooks []*models.Webhook, event *models.AlertCurEvent, stats *astats.Stats) {
|
||||
for _, conf := range webhooks {
|
||||
retryCount := 0
|
||||
for retryCount < 3 {
|
||||
needRetry, res, err := sendWebhook(conf, event, stats)
|
||||
NotifyRecord(ctx, event, "webhook", conf.Url, res, err)
|
||||
needRetry := sendWebhook(conf, event, stats)
|
||||
if !needRetry {
|
||||
break
|
||||
}
|
||||
@@ -106,10 +102,10 @@ func SingleSendWebhooks(ctx *ctx.Context, webhooks []*models.Webhook, event *mod
|
||||
}
|
||||
}
|
||||
|
||||
func BatchSendWebhooks(ctx *ctx.Context, webhooks []*models.Webhook, event *models.AlertCurEvent, stats *astats.Stats) {
|
||||
func BatchSendWebhooks(webhooks []*models.Webhook, event *models.AlertCurEvent, stats *astats.Stats) {
|
||||
for _, conf := range webhooks {
|
||||
logger.Infof("push event:%+v to queue:%v", event, conf)
|
||||
PushEvent(ctx, conf, event, stats)
|
||||
PushEvent(conf, event, stats)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -125,7 +121,7 @@ type WebhookQueue struct {
|
||||
closeCh chan struct{}
|
||||
}
|
||||
|
||||
func PushEvent(ctx *ctx.Context, webhook *models.Webhook, event *models.AlertCurEvent, stats *astats.Stats) {
|
||||
func PushEvent(webhook *models.Webhook, event *models.AlertCurEvent, stats *astats.Stats) {
|
||||
EventQueueLock.RLock()
|
||||
queue := EventQueue[webhook.Url]
|
||||
EventQueueLock.RUnlock()
|
||||
@@ -140,7 +136,7 @@ func PushEvent(ctx *ctx.Context, webhook *models.Webhook, event *models.AlertCur
|
||||
EventQueue[webhook.Url] = queue
|
||||
EventQueueLock.Unlock()
|
||||
|
||||
StartConsumer(ctx, queue, webhook.Batch, webhook, stats)
|
||||
StartConsumer(queue, webhook.Batch, webhook, stats)
|
||||
}
|
||||
|
||||
succ := queue.list.PushFront(event)
|
||||
@@ -150,7 +146,7 @@ func PushEvent(ctx *ctx.Context, webhook *models.Webhook, event *models.AlertCur
|
||||
}
|
||||
}
|
||||
|
||||
func StartConsumer(ctx *ctx.Context, queue *WebhookQueue, popSize int, webhook *models.Webhook, stats *astats.Stats) {
|
||||
func StartConsumer(queue *WebhookQueue, popSize int, webhook *models.Webhook, stats *astats.Stats) {
|
||||
for {
|
||||
select {
|
||||
case <-queue.closeCh:
|
||||
@@ -165,21 +161,13 @@ func StartConsumer(ctx *ctx.Context, queue *WebhookQueue, popSize int, webhook *
|
||||
|
||||
retryCount := 0
|
||||
for retryCount < webhook.RetryCount {
|
||||
needRetry, res, err := sendWebhook(webhook, events, stats)
|
||||
needRetry := sendWebhook(webhook, events, stats)
|
||||
if !needRetry {
|
||||
break
|
||||
}
|
||||
go RecordEvents(ctx, webhook, events, stats, res, err)
|
||||
retryCount++
|
||||
time.Sleep(time.Second * time.Duration(webhook.RetryInterval) * time.Duration(retryCount))
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func RecordEvents(ctx *ctx.Context, webhook *models.Webhook, events []*models.AlertCurEvent, stats *astats.Stats, res string, err error) {
|
||||
for _, event := range events {
|
||||
time.Sleep(time.Millisecond * 10)
|
||||
NotifyRecord(ctx, event, "webhook", webhook.Url, res, err)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -37,8 +37,7 @@ func (ws *WecomSender) CallBack(ctx CallBackContext) {
|
||||
},
|
||||
}
|
||||
|
||||
doSendAndRecord(ctx.Ctx, ctx.CallBackURL, ctx.CallBackURL, body, "callback",
|
||||
ctx.Stats, ctx.Events[0])
|
||||
doSend(ctx.CallBackURL, body, models.Wecom, ctx.Stats)
|
||||
ctx.Stats.AlertNotifyTotal.WithLabelValues("rule_callback").Inc()
|
||||
}
|
||||
|
||||
@@ -46,22 +45,21 @@ func (ws *WecomSender) Send(ctx MessageContext) {
|
||||
if len(ctx.Users) == 0 || len(ctx.Events) == 0 {
|
||||
return
|
||||
}
|
||||
urls, tokens := ws.extract(ctx.Users)
|
||||
urls := ws.extract(ctx.Users)
|
||||
message := BuildTplMessage(models.Wecom, ws.tpl, ctx.Events)
|
||||
for i, url := range urls {
|
||||
for _, url := range urls {
|
||||
body := wecom{
|
||||
Msgtype: "markdown",
|
||||
Markdown: wecomMarkdown{
|
||||
Content: message,
|
||||
},
|
||||
}
|
||||
doSendAndRecord(ctx.Ctx, url, tokens[i], body, models.Wecom, ctx.Stats, ctx.Events[0])
|
||||
doSend(url, body, models.Wecom, ctx.Stats)
|
||||
}
|
||||
}
|
||||
|
||||
func (ws *WecomSender) extract(users []*models.User) ([]string, []string) {
|
||||
func (ws *WecomSender) extract(users []*models.User) []string {
|
||||
urls := make([]string, 0, len(users))
|
||||
tokens := make([]string, 0, len(users))
|
||||
for _, user := range users {
|
||||
if token, has := user.ExtractToken(models.Wecom); has {
|
||||
url := token
|
||||
@@ -69,8 +67,7 @@ func (ws *WecomSender) extract(users []*models.User) ([]string, []string) {
|
||||
url = "https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=" + token
|
||||
}
|
||||
urls = append(urls, url)
|
||||
tokens = append(tokens, token)
|
||||
}
|
||||
}
|
||||
return urls, tokens
|
||||
return urls
|
||||
}
|
||||
|
||||
@@ -13,7 +13,6 @@ type Center struct {
|
||||
UseFileAssets bool
|
||||
FlashDuty FlashDuty
|
||||
EventHistoryGroupView bool
|
||||
CleanNotifyRecordDay int
|
||||
}
|
||||
|
||||
type Plugin struct {
|
||||
|
||||
@@ -16,9 +16,7 @@ import (
|
||||
centerrt "github.com/ccfos/nightingale/v6/center/router"
|
||||
"github.com/ccfos/nightingale/v6/center/sso"
|
||||
"github.com/ccfos/nightingale/v6/conf"
|
||||
"github.com/ccfos/nightingale/v6/cron"
|
||||
"github.com/ccfos/nightingale/v6/dumper"
|
||||
"github.com/ccfos/nightingale/v6/ibex"
|
||||
"github.com/ccfos/nightingale/v6/memsto"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/models/migrate"
|
||||
@@ -34,6 +32,8 @@ import (
|
||||
"github.com/ccfos/nightingale/v6/pushgw/writer"
|
||||
"github.com/ccfos/nightingale/v6/storage"
|
||||
"github.com/ccfos/nightingale/v6/tdengine"
|
||||
|
||||
"github.com/flashcatcloud/ibex/src/cmd/ibex"
|
||||
)
|
||||
|
||||
func Initialize(configDir string, cryptoKey string) (func(), error) {
|
||||
@@ -60,14 +60,7 @@ func Initialize(configDir string, cryptoKey string) (func(), error) {
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
var redis storage.Redis
|
||||
redis, err = storage.NewRedis(config.Redis)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
ctx := ctx.NewContext(context.Background(), db, redis, true)
|
||||
ctx := ctx.NewContext(context.Background(), db, true)
|
||||
migrate.Migrate(db)
|
||||
models.InitRoot(ctx)
|
||||
|
||||
@@ -79,6 +72,11 @@ func Initialize(configDir string, cryptoKey string) (func(), error) {
|
||||
}
|
||||
|
||||
integration.Init(ctx, config.Center.BuiltinIntegrationsDir)
|
||||
var redis storage.Redis
|
||||
redis, err = storage.NewRedis(config.Redis)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
metas := metas.New(redis)
|
||||
idents := idents.New(ctx, redis)
|
||||
@@ -108,8 +106,6 @@ func Initialize(configDir string, cryptoKey string) (func(), error) {
|
||||
|
||||
go version.GetGithubVersion()
|
||||
|
||||
go cron.CleanNotifyRecord(ctx, config.Center.CleanNotifyRecordDay)
|
||||
|
||||
alertrtRouter := alertrt.New(config.HTTP, config.Alert, alertMuteCache, targetCache, busiGroupCache, alertStats, ctx, externalProcessors)
|
||||
centerRouter := centerrt.New(config.HTTP, config.Center, config.Alert, config.Ibex, cconf.Operations, dsCache, notifyConfigCache, promClients, tdengineClients,
|
||||
redis, sso, ctx, metas, idents, targetCache, userCache, userGroupCache)
|
||||
@@ -124,10 +120,10 @@ func Initialize(configDir string, cryptoKey string) (func(), error) {
|
||||
|
||||
if config.Ibex.Enable {
|
||||
migrate.MigrateIbexTables(db)
|
||||
ibex.ServerStart(ctx, true, db, redis, config.HTTP.APIForService.BasicAuth, config.Alert.Heartbeat, &config.CenterApi, r, centerRouter, config.Ibex, config.HTTP.Port)
|
||||
ibex.ServerStart(true, db, redis, config.HTTP.APIForService.BasicAuth, config.Alert.Heartbeat, &config.CenterApi, r, centerRouter, config.Ibex, config.HTTP.Port)
|
||||
}
|
||||
|
||||
httpClean := httpx.Init(config.HTTP, context.Background(), r)
|
||||
httpClean := httpx.Init(config.HTTP, r)
|
||||
|
||||
return func() {
|
||||
logxClean()
|
||||
|
||||
@@ -315,8 +315,6 @@ func (rt *Router) Config(r *gin.Engine) {
|
||||
pages.GET("/busi-group/:id/alert-rules", rt.auth(), rt.user(), rt.perm("/alert-rules"), rt.alertRuleGets)
|
||||
pages.POST("/busi-group/:id/alert-rules", rt.auth(), rt.user(), rt.perm("/alert-rules/add"), rt.bgrw(), rt.alertRuleAddByFE)
|
||||
pages.POST("/busi-group/:id/alert-rules/import", rt.auth(), rt.user(), rt.perm("/alert-rules/add"), rt.bgrw(), rt.alertRuleAddByImport)
|
||||
pages.POST("/busi-group/:id/alert-rules/import-prom-rule", rt.auth(),
|
||||
rt.user(), rt.perm("/alert-rules/add"), rt.bgrw(), rt.alertRuleAddByImportPromRule)
|
||||
pages.DELETE("/busi-group/:id/alert-rules", rt.auth(), rt.user(), rt.perm("/alert-rules/del"), rt.bgrw(), rt.alertRuleDel)
|
||||
pages.PUT("/busi-group/:id/alert-rules/fields", rt.auth(), rt.user(), rt.perm("/alert-rules/put"), rt.bgrw(), rt.alertRulePutFields)
|
||||
pages.PUT("/busi-group/:id/alert-rule/:arid", rt.auth(), rt.user(), rt.perm("/alert-rules/put"), rt.alertRulePutByFE)
|
||||
@@ -324,7 +322,6 @@ func (rt *Router) Config(r *gin.Engine) {
|
||||
pages.GET("/alert-rule/:arid/pure", rt.auth(), rt.user(), rt.perm("/alert-rules"), rt.alertRulePureGet)
|
||||
pages.PUT("/busi-group/alert-rule/validate", rt.auth(), rt.user(), rt.perm("/alert-rules/put"), rt.alertRuleValidation)
|
||||
pages.POST("/relabel-test", rt.auth(), rt.user(), rt.relabelTest)
|
||||
pages.POST("/busi-group/:id/alert-rules/clone", rt.auth(), rt.user(), rt.perm("/alert-rules/post"), rt.bgrw(), rt.cloneToMachine)
|
||||
|
||||
pages.GET("/busi-groups/recording-rules", rt.auth(), rt.user(), rt.perm("/recording-rules"), rt.recordingRuleGetsByGids)
|
||||
pages.GET("/busi-group/:id/recording-rules", rt.auth(), rt.user(), rt.perm("/recording-rules"), rt.recordingRuleGets)
|
||||
@@ -353,11 +350,9 @@ func (rt *Router) Config(r *gin.Engine) {
|
||||
if rt.Center.AnonymousAccess.AlertDetail {
|
||||
pages.GET("/alert-cur-event/:eid", rt.alertCurEventGet)
|
||||
pages.GET("/alert-his-event/:eid", rt.alertHisEventGet)
|
||||
pages.GET("/event-notify-records/:eid", rt.notificationRecordList)
|
||||
} else {
|
||||
pages.GET("/alert-cur-event/:eid", rt.auth(), rt.user(), rt.alertCurEventGet)
|
||||
pages.GET("/alert-his-event/:eid", rt.auth(), rt.user(), rt.alertHisEventGet)
|
||||
pages.GET("/event-notify-records/:eid", rt.auth(), rt.user(), rt.notificationRecordList)
|
||||
}
|
||||
|
||||
// card logic
|
||||
@@ -556,9 +551,6 @@ func (rt *Router) Config(r *gin.Engine) {
|
||||
|
||||
service.GET("/targets-of-alert-rule", rt.targetsOfAlertRule)
|
||||
|
||||
service.POST("/notify-record", rt.notificationRecordAdd)
|
||||
|
||||
service.GET("/alert-cur-events-del-by-hash", rt.alertCurEventDelByHash)
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -222,11 +222,6 @@ func (rt *Router) alertCurEventGet(c *gin.Context) {
|
||||
rt.bgroCheck(c, event.GroupId)
|
||||
}
|
||||
|
||||
ruleConfig, needReset := models.FillRuleConfigTplName(rt.Ctx, event.RuleConfig)
|
||||
if needReset {
|
||||
event.RuleConfigJson = ruleConfig
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Data(event, nil)
|
||||
}
|
||||
|
||||
@@ -234,8 +229,3 @@ func (rt *Router) alertCurEventsStatistics(c *gin.Context) {
|
||||
|
||||
ginx.NewRender(c).Data(models.AlertCurEventStatistics(rt.Ctx, time.Now()), nil)
|
||||
}
|
||||
|
||||
func (rt *Router) alertCurEventDelByHash(c *gin.Context) {
|
||||
hash := ginx.QueryStr(c, "hash")
|
||||
ginx.NewRender(c).Message(models.AlertCurEventDelByHash(rt.Ctx, hash))
|
||||
}
|
||||
|
||||
@@ -87,11 +87,6 @@ func (rt *Router) alertHisEventGet(c *gin.Context) {
|
||||
rt.bgroCheck(c, event.GroupId)
|
||||
}
|
||||
|
||||
ruleConfig, needReset := models.FillRuleConfigTplName(rt.Ctx, event.RuleConfig)
|
||||
if needReset {
|
||||
event.RuleConfigJson = ruleConfig
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Data(event, err)
|
||||
}
|
||||
|
||||
|
||||
@@ -1,22 +1,17 @@
|
||||
package router
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"net/http"
|
||||
"regexp"
|
||||
"strconv"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"gopkg.in/yaml.v2"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pushgw/pconf"
|
||||
"github.com/ccfos/nightingale/v6/pushgw/writer"
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
"github.com/jinzhu/copier"
|
||||
"github.com/prometheus/prometheus/prompb"
|
||||
"github.com/toolkits/pkg/ginx"
|
||||
"github.com/toolkits/pkg/i18n"
|
||||
@@ -130,34 +125,6 @@ func (rt *Router) alertRuleAddByImport(c *gin.Context) {
|
||||
ginx.NewRender(c).Data(reterr, nil)
|
||||
}
|
||||
|
||||
type promRuleForm struct {
|
||||
Payload string `json:"payload" binding:"required"`
|
||||
DatasourceIds []int64 `json:"datasource_ids" binding:"required"`
|
||||
Disabled int `json:"disabled" binding:"gte=0,lte=1"`
|
||||
}
|
||||
|
||||
func (rt *Router) alertRuleAddByImportPromRule(c *gin.Context) {
|
||||
var f promRuleForm
|
||||
ginx.Dangerous(c.BindJSON(&f))
|
||||
|
||||
var pr struct {
|
||||
Groups []models.PromRuleGroup `yaml:"groups"`
|
||||
}
|
||||
err := yaml.Unmarshal([]byte(f.Payload), &pr)
|
||||
if err != nil {
|
||||
ginx.Bomb(http.StatusBadRequest, "invalid yaml format, please use the example format. err: %v", err)
|
||||
}
|
||||
|
||||
if len(pr.Groups) == 0 {
|
||||
ginx.Bomb(http.StatusBadRequest, "input yaml is empty")
|
||||
}
|
||||
|
||||
lst := models.DealPromGroup(pr.Groups, f.DatasourceIds, f.Disabled)
|
||||
username := c.MustGet("username").(string)
|
||||
bgid := ginx.UrlParamInt64(c, "id")
|
||||
ginx.NewRender(c).Data(rt.alertRuleAdd(lst, username, bgid, c.GetHeader("X-Language")), nil)
|
||||
}
|
||||
|
||||
func (rt *Router) alertRuleAddByService(c *gin.Context) {
|
||||
var lst []models.AlertRule
|
||||
ginx.BindJSON(c, &lst)
|
||||
@@ -308,43 +275,6 @@ func (rt *Router) alertRulePutFields(c *gin.Context) {
|
||||
continue
|
||||
}
|
||||
|
||||
if f.Action == "update_triggers" {
|
||||
if triggers, has := f.Fields["triggers"]; has {
|
||||
originRule := ar.RuleConfigJson.(map[string]interface{})
|
||||
originRule["triggers"] = triggers
|
||||
b, err := json.Marshal(originRule)
|
||||
ginx.Dangerous(err)
|
||||
ginx.Dangerous(ar.UpdateFieldsMap(rt.Ctx, map[string]interface{}{"rule_config": string(b)}))
|
||||
continue
|
||||
}
|
||||
}
|
||||
|
||||
if f.Action == "annotations_add" {
|
||||
if annotations, has := f.Fields["annotations"]; has {
|
||||
annotationsMap := annotations.(map[string]interface{})
|
||||
for k, v := range annotationsMap {
|
||||
ar.AnnotationsJSON[k] = v.(string)
|
||||
}
|
||||
b, err := json.Marshal(ar.AnnotationsJSON)
|
||||
ginx.Dangerous(err)
|
||||
ginx.Dangerous(ar.UpdateFieldsMap(rt.Ctx, map[string]interface{}{"annotations": string(b)}))
|
||||
continue
|
||||
}
|
||||
}
|
||||
|
||||
if f.Action == "annotations_del" {
|
||||
if annotations, has := f.Fields["annotations"]; has {
|
||||
annotationsKeys := annotations.(map[string]interface{})
|
||||
for key := range annotationsKeys {
|
||||
delete(ar.AnnotationsJSON, key)
|
||||
}
|
||||
b, err := json.Marshal(ar.AnnotationsJSON)
|
||||
ginx.Dangerous(err)
|
||||
ginx.Dangerous(ar.UpdateFieldsMap(rt.Ctx, map[string]interface{}{"annotations": string(b)}))
|
||||
continue
|
||||
}
|
||||
}
|
||||
|
||||
if f.Action == "callback_add" {
|
||||
// 增加一个 callback 地址
|
||||
if callbacks, has := f.Fields["callbacks"]; has {
|
||||
@@ -523,71 +453,3 @@ func (rt *Router) relabelTest(c *gin.Context) {
|
||||
|
||||
ginx.NewRender(c).Data(tags, nil)
|
||||
}
|
||||
|
||||
type identListForm struct {
|
||||
Ids []int64 `json:"ids"`
|
||||
IdentList []string `json:"ident_list"`
|
||||
}
|
||||
|
||||
func (rt *Router) cloneToMachine(c *gin.Context) {
|
||||
var f identListForm
|
||||
ginx.BindJSON(c, &f)
|
||||
|
||||
if len(f.IdentList) == 0 {
|
||||
ginx.Bomb(http.StatusBadRequest, "ident_list is empty")
|
||||
}
|
||||
|
||||
alertRules, err := models.AlertRuleGetsByIds(rt.Ctx, f.Ids)
|
||||
ginx.Dangerous(err)
|
||||
|
||||
re := regexp.MustCompile(`ident\s*=\s*\\".*?\\"`)
|
||||
|
||||
user := c.MustGet("username").(string)
|
||||
now := time.Now().Unix()
|
||||
|
||||
newRules := make([]*models.AlertRule, 0)
|
||||
|
||||
reterr := make(map[string]map[string]string)
|
||||
|
||||
for i := range alertRules {
|
||||
reterr[alertRules[i].Name] = make(map[string]string)
|
||||
|
||||
if alertRules[i].Cate != "prometheus" {
|
||||
reterr[alertRules[i].Name]["all"] = "Only Prometheus rules can be cloned to machines"
|
||||
continue
|
||||
}
|
||||
|
||||
for j := range f.IdentList {
|
||||
alertRules[i].RuleConfig = re.ReplaceAllString(alertRules[i].RuleConfig, fmt.Sprintf(`ident=\"%s\"`, f.IdentList[j]))
|
||||
|
||||
newRule := &models.AlertRule{}
|
||||
if err := copier.Copy(newRule, alertRules[i]); err != nil {
|
||||
reterr[alertRules[i].Name][f.IdentList[j]] = fmt.Sprintf("fail to clone rule, err: %s", err)
|
||||
continue
|
||||
}
|
||||
|
||||
newRule.Id = 0
|
||||
newRule.Name = alertRules[i].Name + "_" + f.IdentList[j]
|
||||
newRule.CreateBy = user
|
||||
newRule.UpdateBy = user
|
||||
newRule.UpdateAt = now
|
||||
newRule.CreateAt = now
|
||||
newRule.RuleConfig = alertRules[i].RuleConfig
|
||||
|
||||
exist, err := models.AlertRuleExists(rt.Ctx, 0, newRule.GroupId, newRule.DatasourceIdsJson, newRule.Name)
|
||||
if err != nil {
|
||||
reterr[alertRules[i].Name][f.IdentList[j]] = err.Error()
|
||||
continue
|
||||
}
|
||||
|
||||
if exist {
|
||||
reterr[alertRules[i].Name][f.IdentList[j]] = fmt.Sprintf("rule already exists, ruleName: %s", newRule.Name)
|
||||
continue
|
||||
}
|
||||
|
||||
newRules = append(newRules, newRule)
|
||||
}
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Data(reterr, models.InsertAlertRule(rt.Ctx, newRules))
|
||||
}
|
||||
|
||||
@@ -3,9 +3,8 @@ package router
|
||||
import (
|
||||
"compress/gzip"
|
||||
"encoding/json"
|
||||
"errors"
|
||||
"fmt"
|
||||
"io/ioutil"
|
||||
"sort"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
@@ -58,7 +57,7 @@ func HandleHeartbeat(c *gin.Context, ctx *ctx.Context, engineName string, metaSe
|
||||
}
|
||||
|
||||
if req.Hostname == "" {
|
||||
return req, errors.New("hostname is required")
|
||||
return req, fmt.Errorf("hostname is required", 400)
|
||||
}
|
||||
|
||||
// maybe from pushgw
|
||||
@@ -83,87 +82,52 @@ func HandleHeartbeat(c *gin.Context, ctx *ctx.Context, engineName string, metaSe
|
||||
gid := ginx.QueryInt64(c, "gid", 0)
|
||||
hostIp := strings.TrimSpace(req.HostIp)
|
||||
|
||||
newTarget := models.Target{}
|
||||
targetNeedUpdate := false
|
||||
field := make(map[string]interface{})
|
||||
if gid != 0 && gid != target.GroupId {
|
||||
newTarget.GroupId = gid
|
||||
targetNeedUpdate = true
|
||||
field["group_id"] = gid
|
||||
}
|
||||
|
||||
if hostIp != "" && hostIp != target.HostIp {
|
||||
newTarget.HostIp = hostIp
|
||||
targetNeedUpdate = true
|
||||
field["host_ip"] = hostIp
|
||||
}
|
||||
|
||||
hostTagsMap := target.GetHostTagsMap()
|
||||
hostTagNeedUpdate := false
|
||||
if len(hostTagsMap) != len(req.GlobalLabels) {
|
||||
hostTagNeedUpdate = true
|
||||
} else {
|
||||
for k, v := range req.GlobalLabels {
|
||||
if v == "" {
|
||||
continue
|
||||
}
|
||||
|
||||
if tagv, ok := hostTagsMap[k]; !ok || tagv != v {
|
||||
hostTagNeedUpdate = true
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if hostTagNeedUpdate {
|
||||
lst := []string{}
|
||||
for k, v := range req.GlobalLabels {
|
||||
lst = append(lst, k+"="+v)
|
||||
}
|
||||
sort.Strings(lst)
|
||||
newTarget.HostTags = lst
|
||||
targetNeedUpdate = true
|
||||
}
|
||||
|
||||
userTagsMap := target.GetTagsMap()
|
||||
userTagNeedUpdate := false
|
||||
userTags := []string{}
|
||||
for k, v := range userTagsMap {
|
||||
tagsMap := target.GetTagsMap()
|
||||
tagNeedUpdate := false
|
||||
for k, v := range req.GlobalLabels {
|
||||
if v == "" {
|
||||
continue
|
||||
}
|
||||
|
||||
if _, ok := req.GlobalLabels[k]; !ok {
|
||||
userTags = append(userTags, k+"="+v)
|
||||
} else { // 该key在hostTags中已经存在
|
||||
userTagNeedUpdate = true
|
||||
if tagv, ok := tagsMap[k]; !ok || tagv != v {
|
||||
tagNeedUpdate = true
|
||||
tagsMap[k] = v
|
||||
}
|
||||
}
|
||||
|
||||
if userTagNeedUpdate {
|
||||
newTarget.Tags = strings.Join(userTags, " ") + " "
|
||||
targetNeedUpdate = true
|
||||
if tagNeedUpdate {
|
||||
lst := []string{}
|
||||
for k, v := range tagsMap {
|
||||
lst = append(lst, k+"="+v)
|
||||
}
|
||||
labels := strings.Join(lst, " ") + " "
|
||||
field["tags"] = labels
|
||||
}
|
||||
|
||||
if req.EngineName != "" && req.EngineName != target.EngineName {
|
||||
newTarget.EngineName = req.EngineName
|
||||
targetNeedUpdate = true
|
||||
field["engine_name"] = req.EngineName
|
||||
}
|
||||
|
||||
if req.AgentVersion != "" && req.AgentVersion != target.AgentVersion {
|
||||
newTarget.AgentVersion = req.AgentVersion
|
||||
targetNeedUpdate = true
|
||||
field["agent_version"] = req.AgentVersion
|
||||
}
|
||||
|
||||
if req.OS != "" && req.OS != target.OS {
|
||||
newTarget.OS = req.OS
|
||||
targetNeedUpdate = true
|
||||
}
|
||||
|
||||
if targetNeedUpdate {
|
||||
err := models.DB(ctx).Model(&target).Updates(newTarget).Error
|
||||
if len(field) > 0 {
|
||||
err := target.UpdateFieldsMap(ctx, field)
|
||||
if err != nil {
|
||||
logger.Errorf("update target fields failed, err: %v", err)
|
||||
}
|
||||
}
|
||||
logger.Debugf("heartbeat field:%+v target: %v", newTarget, *target)
|
||||
logger.Debugf("heartbeat field:%+v target: %v", field, *target)
|
||||
}
|
||||
|
||||
return req, nil
|
||||
|
||||
@@ -1,205 +0,0 @@
|
||||
package router
|
||||
|
||||
import (
|
||||
"strings"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
"github.com/toolkits/pkg/ginx"
|
||||
"github.com/toolkits/pkg/logger"
|
||||
)
|
||||
|
||||
type NotificationResponse struct {
|
||||
SubRules []SubRule `json:"sub_rules"`
|
||||
Notifies map[string][]Record `json:"notifies"`
|
||||
}
|
||||
|
||||
type SubRule struct {
|
||||
SubID int64 `json:"sub_id"`
|
||||
Notifies map[string][]Record `json:"notifies"`
|
||||
}
|
||||
|
||||
type Notify struct {
|
||||
Channel string `json:"channel"`
|
||||
Records []Record `json:"records"`
|
||||
}
|
||||
|
||||
type Record struct {
|
||||
Target string `json:"target"`
|
||||
Username string `json:"username"`
|
||||
Status int `json:"status"`
|
||||
Detail string `json:"detail"`
|
||||
}
|
||||
|
||||
// notificationRecordAdd
|
||||
func (rt *Router) notificationRecordAdd(c *gin.Context) {
|
||||
var req models.NotificaitonRecord
|
||||
ginx.BindJSON(c, &req)
|
||||
err := req.Add(rt.Ctx)
|
||||
|
||||
ginx.NewRender(c).Data(req.Id, err)
|
||||
}
|
||||
|
||||
func (rt *Router) notificationRecordList(c *gin.Context) {
|
||||
eid := ginx.UrlParamInt64(c, "eid")
|
||||
lst, err := models.NotificaitonRecordsGetByEventId(rt.Ctx, eid)
|
||||
ginx.Dangerous(err)
|
||||
|
||||
response := buildNotificationResponse(rt.Ctx, lst)
|
||||
ginx.NewRender(c).Data(response, nil)
|
||||
}
|
||||
|
||||
func buildNotificationResponse(ctx *ctx.Context, nl []*models.NotificaitonRecord) NotificationResponse {
|
||||
response := NotificationResponse{
|
||||
SubRules: []SubRule{},
|
||||
Notifies: make(map[string][]Record),
|
||||
}
|
||||
|
||||
subRuleMap := make(map[int64]*SubRule)
|
||||
|
||||
// Collect all group IDs
|
||||
groupIdSet := make(map[int64]struct{})
|
||||
|
||||
// map[SubId]map[Channel]map[Target]index
|
||||
filter := make(map[int64]map[string]map[string]int)
|
||||
|
||||
for i, n := range nl {
|
||||
// 对相同的 channel-target 进行合并
|
||||
for _, gid := range n.GetGroupIds(ctx) {
|
||||
groupIdSet[gid] = struct{}{}
|
||||
}
|
||||
|
||||
if _, exists := filter[n.SubId]; !exists {
|
||||
filter[n.SubId] = make(map[string]map[string]int)
|
||||
}
|
||||
|
||||
if _, exists := filter[n.SubId][n.Channel]; !exists {
|
||||
filter[n.SubId][n.Channel] = make(map[string]int)
|
||||
}
|
||||
|
||||
idx, exists := filter[n.SubId][n.Channel][n.Target]
|
||||
if !exists {
|
||||
filter[n.SubId][n.Channel][n.Target] = i
|
||||
} else {
|
||||
if nl[idx].Status < n.Status {
|
||||
nl[idx].Status = n.Status
|
||||
}
|
||||
nl[idx].Details = nl[idx].Details + ", " + n.Details
|
||||
nl[i] = nil
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
// Fill usernames only once
|
||||
usernameByTarget := fillUserNames(ctx, groupIdSet)
|
||||
|
||||
for _, n := range nl {
|
||||
if n == nil {
|
||||
continue
|
||||
}
|
||||
|
||||
m := usernameByTarget[n.Target]
|
||||
usernames := make([]string, 0, len(m))
|
||||
for k := range m {
|
||||
usernames = append(usernames, k)
|
||||
}
|
||||
|
||||
if !checkChannel(n.Channel) {
|
||||
// Hide sensitive information
|
||||
n.Target = replaceLastEightChars(n.Target)
|
||||
}
|
||||
record := Record{
|
||||
Target: n.Target,
|
||||
Status: n.Status,
|
||||
Detail: n.Details,
|
||||
}
|
||||
|
||||
record.Username = strings.Join(usernames, ",")
|
||||
|
||||
if n.SubId > 0 {
|
||||
// Handle SubRules
|
||||
subRule, ok := subRuleMap[n.SubId]
|
||||
if !ok {
|
||||
newSubRule := &SubRule{
|
||||
SubID: n.SubId,
|
||||
}
|
||||
newSubRule.Notifies = make(map[string][]Record)
|
||||
newSubRule.Notifies[n.Channel] = []Record{record}
|
||||
|
||||
subRuleMap[n.SubId] = newSubRule
|
||||
} else {
|
||||
if _, exists := subRule.Notifies[n.Channel]; !exists {
|
||||
|
||||
subRule.Notifies[n.Channel] = []Record{record}
|
||||
} else {
|
||||
subRule.Notifies[n.Channel] = append(subRule.Notifies[n.Channel], record)
|
||||
}
|
||||
}
|
||||
continue
|
||||
}
|
||||
|
||||
if response.Notifies == nil {
|
||||
response.Notifies = make(map[string][]Record)
|
||||
}
|
||||
|
||||
if _, exists := response.Notifies[n.Channel]; !exists {
|
||||
response.Notifies[n.Channel] = []Record{record}
|
||||
} else {
|
||||
response.Notifies[n.Channel] = append(response.Notifies[n.Channel], record)
|
||||
}
|
||||
}
|
||||
|
||||
for _, subRule := range subRuleMap {
|
||||
response.SubRules = append(response.SubRules, *subRule)
|
||||
}
|
||||
|
||||
return response
|
||||
}
|
||||
|
||||
// check channel is one of the following: tx-sms, tx-voice, ali-sms, ali-voice, email, script
|
||||
func checkChannel(channel string) bool {
|
||||
switch channel {
|
||||
case "tx-sms", "tx-voice", "ali-sms", "ali-voice", "email", "script":
|
||||
return true
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
func replaceLastEightChars(s string) string {
|
||||
if len(s) <= 8 {
|
||||
return strings.Repeat("*", len(s))
|
||||
}
|
||||
return s[:len(s)-8] + strings.Repeat("*", 8)
|
||||
}
|
||||
|
||||
func fillUserNames(ctx *ctx.Context, groupIdSet map[int64]struct{}) map[string]map[string]struct{} {
|
||||
userNameByTarget := make(map[string]map[string]struct{})
|
||||
|
||||
gids := make([]int64, 0, len(groupIdSet))
|
||||
for gid := range groupIdSet {
|
||||
gids = append(gids, gid)
|
||||
}
|
||||
|
||||
users, err := models.UsersGetByGroupIds(ctx, gids)
|
||||
if err != nil {
|
||||
logger.Errorf("UsersGetByGroupIds failed, err: %v", err)
|
||||
return userNameByTarget
|
||||
}
|
||||
|
||||
for _, user := range users {
|
||||
logger.Warningf("user: %s", user.Username)
|
||||
for _, ch := range models.DefaultChannels {
|
||||
target, exist := user.ExtractToken(ch)
|
||||
if exist {
|
||||
if _, ok := userNameByTarget[target]; !ok {
|
||||
userNameByTarget[target] = make(map[string]struct{})
|
||||
}
|
||||
userNameByTarget[target][user.Username] = struct{}{}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return userNameByTarget
|
||||
}
|
||||
@@ -182,7 +182,7 @@ func (rt *Router) notifyConfigPut(c *gin.Context) {
|
||||
|
||||
smtp, errSmtp := SmtpValidate(text)
|
||||
ginx.Dangerous(errSmtp)
|
||||
go sender.RestartEmailSender(rt.Ctx, smtp)
|
||||
go sender.RestartEmailSender(smtp)
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Message(nil)
|
||||
|
||||
@@ -68,7 +68,7 @@ func (rt *Router) notifyTplUpdate(c *gin.Context) {
|
||||
}
|
||||
|
||||
// get the count of the same channel and name but different id
|
||||
count, err := models.Count(models.DB(rt.Ctx).Model(&models.NotifyTpl{}).Where("(channel = ? or name = ?) and id <> ?", f.Channel, f.Name, f.Id))
|
||||
count, err := models.Count(models.DB(rt.Ctx).Model(&models.NotifyTpl{}).Where("channel = ? or name = ? and id <> ?", f.Channel, f.Name, f.Id))
|
||||
ginx.Dangerous(err)
|
||||
if count != 0 {
|
||||
ginx.Bomb(200, "Refuse to create duplicate channel or name")
|
||||
|
||||
@@ -157,8 +157,7 @@ func (rt *Router) targetGetsByService(c *gin.Context) {
|
||||
func (rt *Router) targetGetTags(c *gin.Context) {
|
||||
idents := ginx.QueryStr(c, "idents", "")
|
||||
idents = strings.ReplaceAll(idents, ",", " ")
|
||||
ignoreHostTag := ginx.QueryBool(c, "ignore_host_tag", false)
|
||||
lst, err := models.TargetGetTags(rt.Ctx, strings.Fields(idents), ignoreHostTag)
|
||||
lst, err := models.TargetGetTags(rt.Ctx, strings.Fields(idents))
|
||||
ginx.NewRender(c).Data(lst, err)
|
||||
}
|
||||
|
||||
@@ -261,11 +260,9 @@ func (rt *Router) validateTags(tags []string) error {
|
||||
}
|
||||
|
||||
func (rt *Router) addTagsToTarget(target *models.Target, tags []string) error {
|
||||
hostTagsMap := target.GetHostTagsMap()
|
||||
for _, tag := range tags {
|
||||
tagKey := strings.Split(tag, "=")[0]
|
||||
if _, ok := hostTagsMap[tagKey]; ok ||
|
||||
strings.Contains(target.Tags, tagKey+"=") {
|
||||
if strings.Contains(target.Tags, tagKey+"=") {
|
||||
return fmt.Errorf("duplicate tagkey(%s)", tagKey)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -126,7 +126,7 @@ func (rt *Router) taskAdd(c *gin.Context) {
|
||||
rt.checkTargetPerm(c, f.Hosts)
|
||||
|
||||
// call ibex
|
||||
taskId, err := sender.TaskAdd(rt.Ctx, f, user.Username, rt.Ctx.IsCenter)
|
||||
taskId, err := sender.TaskAdd(f, user.Username, rt.Ctx.IsCenter)
|
||||
ginx.Dangerous(err)
|
||||
|
||||
if taskId <= 0 {
|
||||
|
||||
@@ -196,13 +196,6 @@ func (rt *Router) taskTplDel(c *gin.Context) {
|
||||
return
|
||||
}
|
||||
|
||||
ids, err := models.GetAlertRuleIdsByTaskId(rt.Ctx, tid)
|
||||
ginx.Dangerous(err)
|
||||
if len(ids) > 0 {
|
||||
ginx.NewRender(c).Message("can't del this task tpl, used by alert rule ids(%v) ", ids)
|
||||
return
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Message(tpl.Del(rt.Ctx))
|
||||
}
|
||||
|
||||
|
||||
@@ -48,7 +48,7 @@ func (rt *Router) userGets(c *gin.Context) {
|
||||
order := ginx.QueryStr(c, "order", "username")
|
||||
desc := ginx.QueryBool(c, "desc", false)
|
||||
|
||||
go rt.UserCache.UpdateUsersLastActiveTime()
|
||||
rt.UserCache.UpdateUsersLastActiveTime()
|
||||
total, err := models.UserTotal(rt.Ctx, query, stime, etime)
|
||||
ginx.Dangerous(err)
|
||||
|
||||
|
||||
@@ -1,7 +1,6 @@
|
||||
package sso
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"log"
|
||||
"time"
|
||||
|
||||
@@ -146,7 +145,7 @@ func Init(center cconf.Center, ctx *ctx.Context, configCache *memsto.ConfigCache
|
||||
}
|
||||
}
|
||||
if configCache == nil {
|
||||
log.Fatalln(fmt.Errorf("configCache is nil, sso initialization failed"))
|
||||
logger.Error("configCache is nil, sso initialization failed")
|
||||
}
|
||||
ssoClient.configCache = configCache
|
||||
userVariableMap := configCache.Get()
|
||||
|
||||
@@ -18,7 +18,7 @@ func Upgrade(configFile string) error {
|
||||
return err
|
||||
}
|
||||
|
||||
ctx := ctx.NewContext(context.Background(), db, nil, true)
|
||||
ctx := ctx.NewContext(context.Background(), db, true)
|
||||
for _, cluster := range config.Clusters {
|
||||
count, err := models.GetDatasourcesCountByName(ctx, cluster.Name)
|
||||
if err != nil {
|
||||
|
||||
@@ -12,7 +12,6 @@ import (
|
||||
"github.com/ccfos/nightingale/v6/center/metas"
|
||||
"github.com/ccfos/nightingale/v6/conf"
|
||||
"github.com/ccfos/nightingale/v6/dumper"
|
||||
"github.com/ccfos/nightingale/v6/ibex"
|
||||
"github.com/ccfos/nightingale/v6/memsto"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
"github.com/ccfos/nightingale/v6/pkg/httpx"
|
||||
@@ -23,6 +22,8 @@ import (
|
||||
"github.com/ccfos/nightingale/v6/pushgw/writer"
|
||||
"github.com/ccfos/nightingale/v6/storage"
|
||||
"github.com/ccfos/nightingale/v6/tdengine"
|
||||
|
||||
"github.com/flashcatcloud/ibex/src/cmd/ibex"
|
||||
)
|
||||
|
||||
func Initialize(configDir string, cryptoKey string) (func(), error) {
|
||||
@@ -39,6 +40,7 @@ func Initialize(configDir string, cryptoKey string) (func(), error) {
|
||||
if len(config.CenterApi.Addrs) < 1 {
|
||||
return nil, errors.New("failed to init config: the CenterApi configuration is missing")
|
||||
}
|
||||
ctx := ctx.NewContext(context.Background(), nil, false, config.CenterApi)
|
||||
|
||||
var redis storage.Redis
|
||||
redis, err = storage.NewRedis(config.Redis)
|
||||
@@ -46,8 +48,6 @@ func Initialize(configDir string, cryptoKey string) (func(), error) {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
ctx := ctx.NewContext(context.Background(), nil, redis, false, config.CenterApi)
|
||||
|
||||
syncStats := memsto.NewSyncStats()
|
||||
|
||||
targetCache := memsto.NewTargetCache(ctx, syncStats, redis)
|
||||
@@ -82,12 +82,12 @@ func Initialize(configDir string, cryptoKey string) (func(), error) {
|
||||
alertrtRouter.Config(r)
|
||||
|
||||
if config.Ibex.Enable {
|
||||
ibex.ServerStart(ctx, false, nil, redis, config.HTTP.APIForService.BasicAuth, config.Alert.Heartbeat, &config.CenterApi, r, nil, config.Ibex, config.HTTP.Port)
|
||||
ibex.ServerStart(false, nil, redis, config.HTTP.APIForService.BasicAuth, config.Alert.Heartbeat, &config.CenterApi, r, nil, config.Ibex, config.HTTP.Port)
|
||||
}
|
||||
}
|
||||
|
||||
dumper.ConfigRouter(r)
|
||||
httpClean := httpx.Init(config.HTTP, context.Background(), r)
|
||||
httpClean := httpx.Init(config.HTTP, r)
|
||||
|
||||
return func() {
|
||||
logxClean()
|
||||
|
||||
119
cmd/ibex/main.go
119
cmd/ibex/main.go
@@ -1,119 +0,0 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"os"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/ibex/agentd"
|
||||
"github.com/ccfos/nightingale/v6/ibex/server"
|
||||
|
||||
"github.com/toolkits/pkg/net/tcpx"
|
||||
"github.com/toolkits/pkg/runner"
|
||||
"github.com/urfave/cli/v2"
|
||||
)
|
||||
|
||||
// VERSION go build -ldflags "-X main.VERSION=x.x.x"
|
||||
var VERSION = "not specified"
|
||||
|
||||
func main() {
|
||||
app := cli.NewApp()
|
||||
app.Name = "ibex"
|
||||
app.Version = VERSION
|
||||
app.Usage = "Ibex, running scripts on large scale machines"
|
||||
app.Commands = []*cli.Command{
|
||||
newCenterServerCmd(),
|
||||
newEdgeServerCmd(),
|
||||
newAgentdCmd(),
|
||||
}
|
||||
app.Run(os.Args)
|
||||
}
|
||||
|
||||
func newCenterServerCmd() *cli.Command {
|
||||
return &cli.Command{
|
||||
Name: "server",
|
||||
Usage: "Run server",
|
||||
Flags: []cli.Flag{
|
||||
&cli.StringFlag{
|
||||
Name: "conf",
|
||||
Aliases: []string{"c"},
|
||||
Usage: "specify configuration file(.json,.yaml,.toml)",
|
||||
},
|
||||
},
|
||||
Action: func(c *cli.Context) error {
|
||||
printEnv()
|
||||
|
||||
tcpx.WaitHosts()
|
||||
|
||||
var opts []server.ServerOption
|
||||
if c.String("conf") != "" {
|
||||
opts = append(opts, server.SetConfigFile(c.String("conf")))
|
||||
}
|
||||
opts = append(opts, server.SetVersion(VERSION))
|
||||
|
||||
server.Run(true, opts...)
|
||||
return nil
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
func newEdgeServerCmd() *cli.Command {
|
||||
return &cli.Command{
|
||||
Name: "edge server",
|
||||
Usage: "Run edge server",
|
||||
Flags: []cli.Flag{
|
||||
&cli.StringFlag{
|
||||
Name: "conf",
|
||||
Aliases: []string{"c"},
|
||||
Usage: "specify configuration file(.json,.yaml,.toml)",
|
||||
},
|
||||
},
|
||||
Action: func(c *cli.Context) error {
|
||||
printEnv()
|
||||
|
||||
tcpx.WaitHosts()
|
||||
|
||||
var opts []server.ServerOption
|
||||
if c.String("conf") != "" {
|
||||
opts = append(opts, server.SetConfigFile(c.String("conf")))
|
||||
}
|
||||
opts = append(opts, server.SetVersion(VERSION))
|
||||
|
||||
server.Run(false, opts...)
|
||||
return nil
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
func newAgentdCmd() *cli.Command {
|
||||
return &cli.Command{
|
||||
Name: "agentd",
|
||||
Usage: "Run agentd",
|
||||
Flags: []cli.Flag{
|
||||
&cli.StringFlag{
|
||||
Name: "conf",
|
||||
Aliases: []string{"c"},
|
||||
Usage: "specify configuration file(.json,.yaml,.toml)",
|
||||
},
|
||||
},
|
||||
Action: func(c *cli.Context) error {
|
||||
printEnv()
|
||||
|
||||
var opts []agentd.AgentdOption
|
||||
if c.String("conf") != "" {
|
||||
opts = append(opts, agentd.SetConfigFile(c.String("conf")))
|
||||
}
|
||||
opts = append(opts, agentd.SetVersion(VERSION))
|
||||
|
||||
agentd.Run(opts...)
|
||||
return nil
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
func printEnv() {
|
||||
runner.Init()
|
||||
fmt.Println("runner.cwd:", runner.Cwd)
|
||||
fmt.Println("runner.hostname:", runner.Hostname)
|
||||
fmt.Println("runner.fd_limits:", runner.FdLimits())
|
||||
fmt.Println("runner.vm_limits:", runner.VMLimits())
|
||||
}
|
||||
@@ -1,41 +0,0 @@
|
||||
package cron
|
||||
|
||||
import (
|
||||
"time"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
|
||||
"github.com/robfig/cron/v3"
|
||||
"github.com/toolkits/pkg/logger"
|
||||
)
|
||||
|
||||
func cleanNotifyRecord(ctx *ctx.Context, day int) {
|
||||
lastWeek := time.Now().Unix() - 86400*int64(day)
|
||||
err := models.DB(ctx).Model(&models.NotificaitonRecord{}).Where("created_at < ?", lastWeek).Delete(&models.NotificaitonRecord{}).Error
|
||||
if err != nil {
|
||||
logger.Errorf("Failed to clean notify record: %v", err)
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
// 每天凌晨1点执行清理任务
|
||||
func CleanNotifyRecord(ctx *ctx.Context, day int) {
|
||||
c := cron.New()
|
||||
if day < 1 {
|
||||
day = 7
|
||||
}
|
||||
|
||||
// 使用cron表达式设置每天凌晨1点执行
|
||||
_, err := c.AddFunc("0 1 * * *", func() {
|
||||
cleanNotifyRecord(ctx, day)
|
||||
})
|
||||
|
||||
if err != nil {
|
||||
logger.Errorf("Failed to add clean notify record cron job: %v", err)
|
||||
return
|
||||
}
|
||||
|
||||
// 启动cron任务
|
||||
c.Start()
|
||||
}
|
||||
@@ -1,6 +1,6 @@
|
||||
set names utf8mb4;
|
||||
|
||||
-- drop database if exists n9e_v6;
|
||||
drop database if exists n9e_v6;
|
||||
create database n9e_v6;
|
||||
use n9e_v6;
|
||||
|
||||
@@ -366,7 +366,6 @@ CREATE TABLE `target` (
|
||||
`host_ip` varchar(15) default '' COMMENT 'IPv4 string',
|
||||
`agent_version` varchar(255) default '' COMMENT 'agent version',
|
||||
`engine_name` varchar(255) default '' COMMENT 'engine_name',
|
||||
`os` VARCHAR(31) DEFAULT '' COMMENT 'os type',
|
||||
`update_at` bigint not null default 0,
|
||||
PRIMARY KEY (`id`),
|
||||
UNIQUE KEY (`ident`),
|
||||
@@ -547,18 +546,6 @@ CREATE TABLE `builtin_payloads` (
|
||||
KEY `idx_type` (`type`)
|
||||
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
|
||||
|
||||
CREATE TABLE notification_record (
|
||||
`id` BIGINT PRIMARY KEY AUTO_INCREMENT,
|
||||
`event_id` BIGINT NOT NULL,
|
||||
`sub_id` BIGINT NOT NULL,
|
||||
`channel` VARCHAR(255) NOT NULL,
|
||||
`status` TINYINT NOT NULL DEFAULT 0,
|
||||
`target` VARCHAR(1024) NOT NULL,
|
||||
`details` VARCHAR(2048),
|
||||
`created_at` BIGINT NOT NULL,
|
||||
INDEX idx_evt (event_id)
|
||||
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
|
||||
|
||||
CREATE TABLE `task_tpl`
|
||||
(
|
||||
`id` int unsigned NOT NULL AUTO_INCREMENT,
|
||||
|
||||
@@ -83,24 +83,4 @@ ALTER TABLE recording_rule ADD COLUMN cron_pattern VARCHAR(255) DEFAULT '' COMME
|
||||
|
||||
/* v7.0.0-beta.14 */
|
||||
ALTER TABLE alert_cur_event ADD COLUMN original_tags TEXT COMMENT 'labels key=val,,k2=v2';
|
||||
ALTER TABLE alert_his_event ADD COLUMN original_tags TEXT COMMENT 'labels key=val,,k2=v2';
|
||||
|
||||
/* v7.1.0 */
|
||||
ALTER TABLE target ADD COLUMN os VARCHAR(31) DEFAULT '' COMMENT 'os type';
|
||||
|
||||
/* v7.2.0 */
|
||||
CREATE TABLE notification_record (
|
||||
`id` BIGINT PRIMARY KEY AUTO_INCREMENT,
|
||||
`event_id` BIGINT NOT NULL,
|
||||
`sub_id` BIGINT NOT NULL,
|
||||
`channel` VARCHAR(255) NOT NULL,
|
||||
`status` TINYINT NOT NULL DEFAULT 0,
|
||||
`target` VARCHAR(1024) NOT NULL,
|
||||
`details` VARCHAR(2048),
|
||||
`created_at` BIGINT NOT NULL,
|
||||
INDEX idx_evt (event_id)
|
||||
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
|
||||
|
||||
|
||||
/* v7.3.0 2024-08-26 */
|
||||
ALTER TABLE `target` ADD COLUMN `host_tags` TEXT COMMENT 'global labels set in conf file';
|
||||
ALTER TABLE alert_his_event ADD COLUMN original_tags TEXT COMMENT 'labels key=val,,k2=v2';
|
||||
@@ -1,38 +0,0 @@
|
||||
# debug, release
|
||||
RunMode = "debug"
|
||||
|
||||
# task meta storage dir
|
||||
MetaDir = "./meta"
|
||||
|
||||
[HTTP]
|
||||
Enable = true
|
||||
# http listening address
|
||||
Host = "0.0.0.0"
|
||||
# http listening port
|
||||
Port = 2090
|
||||
# https cert file path
|
||||
CertFile = ""
|
||||
# https key file path
|
||||
KeyFile = ""
|
||||
# whether print access log
|
||||
PrintAccessLog = true
|
||||
# whether enable pprof
|
||||
PProf = false
|
||||
# http graceful shutdown timeout, unit: s
|
||||
ShutdownTimeout = 30
|
||||
# max content length: 64M
|
||||
MaxContentLength = 67108864
|
||||
# http server read timeout, unit: s
|
||||
ReadTimeout = 20
|
||||
# http server write timeout, unit: s
|
||||
WriteTimeout = 40
|
||||
# http server idle timeout, unit: s
|
||||
IdleTimeout = 120
|
||||
|
||||
[Heartbeat]
|
||||
# unit: ms
|
||||
Interval = 1000
|
||||
# rpc servers
|
||||
Servers = ["127.0.0.1:20090"]
|
||||
# $ip or $hostname or specified string
|
||||
Host = "$hostname"
|
||||
@@ -1,20 +0,0 @@
|
||||
[Unit]
|
||||
Description="ibex-agentd"
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
|
||||
ExecStart=/root/gopath/ibex/ibex agentd
|
||||
WorkingDirectory=/root/gopath/ibex
|
||||
|
||||
Restart=on-failure
|
||||
SuccessExitStatus=0
|
||||
LimitNOFILE=65536
|
||||
StandardOutput=syslog
|
||||
StandardError=syslog
|
||||
SyslogIdentifier=ibex-agentd
|
||||
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
@@ -1,20 +0,0 @@
|
||||
[Unit]
|
||||
Description="ibex-server"
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
|
||||
ExecStart=/root/gopath/ibex/ibex server
|
||||
WorkingDirectory=/root/gopath/ibex
|
||||
|
||||
Restart=on-failure
|
||||
SuccessExitStatus=0
|
||||
LimitNOFILE=65536
|
||||
StandardOutput=syslog
|
||||
StandardError=syslog
|
||||
SyslogIdentifier=ibex-server
|
||||
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
@@ -1,86 +0,0 @@
|
||||
# debug, release
|
||||
RunMode = "debug"
|
||||
|
||||
[Log]
|
||||
# log write dir
|
||||
Dir = "logs-server"
|
||||
# log level: DEBUG INFO WARNING ERROR
|
||||
Level = "DEBUG"
|
||||
# stdout, stderr, file
|
||||
Output = "stdout"
|
||||
# # rotate by time
|
||||
# KeepHours: 4
|
||||
# # rotate by size
|
||||
# RotateNum = 3
|
||||
# # unit: MB
|
||||
# RotateSize = 256
|
||||
|
||||
[HTTP]
|
||||
Enable = true
|
||||
# http listening address
|
||||
Host = "0.0.0.0"
|
||||
# http listening port
|
||||
Port = 10090
|
||||
# https cert file path
|
||||
CertFile = ""
|
||||
# https key file path
|
||||
KeyFile = ""
|
||||
# whether print access log
|
||||
PrintAccessLog = true
|
||||
# whether enable pprof
|
||||
PProf = false
|
||||
# http graceful shutdown timeout, unit: s
|
||||
ShutdownTimeout = 30
|
||||
# max content length: 64M
|
||||
MaxContentLength = 67108864
|
||||
# http server read timeout, unit: s
|
||||
ReadTimeout = 20
|
||||
# http server write timeout, unit: s
|
||||
WriteTimeout = 40
|
||||
# http server idle timeout, unit: s
|
||||
IdleTimeout = 120
|
||||
|
||||
[BasicAuth]
|
||||
# using when call apis
|
||||
ibex = "ibex"
|
||||
|
||||
[RPC]
|
||||
Listen = "0.0.0.0:20090"
|
||||
|
||||
[Heartbeat]
|
||||
# auto detect if blank
|
||||
IP = ""
|
||||
# unit: ms
|
||||
Interval = 1000
|
||||
|
||||
[Output]
|
||||
# database | remote
|
||||
ComeFrom = "database"
|
||||
AgtdPort = 2090
|
||||
|
||||
[DB]
|
||||
# postgres: host=%s port=%s user=%s dbname=%s password=%s sslmode=%s
|
||||
# postgres: DSN="host=127.0.0.1 port=5432 user=root dbname=n9e_v6 password=1234 sslmode=disable"
|
||||
DSN="root:1234@tcp(127.0.0.1:3306)/ibex?charset=utf8mb4&parseTime=True&loc=Local&allowNativePasswords=true"
|
||||
# enable debug mode or not
|
||||
Debug = false
|
||||
# mysql postgres
|
||||
DBType = "mysql"
|
||||
# unit: s
|
||||
MaxLifetime = 7200
|
||||
# max open connections
|
||||
MaxOpenConns = 150
|
||||
# max idle connections
|
||||
MaxIdleConns = 50
|
||||
# table prefix
|
||||
TablePrefix = ""
|
||||
|
||||
[Redis]
|
||||
# address, ip:port or ip1:port,ip2:port for cluster and sentinel(SentinelAddrs)
|
||||
Address = "127.0.0.1:6379"
|
||||
# Username = ""
|
||||
# Password = ""
|
||||
# DB = 0
|
||||
# UseTLS = false
|
||||
# TLSMinVersion = "1.2"
|
||||
# standalone cluster sentinel
|
||||
File diff suppressed because one or more lines are too long
13
go.mod
13
go.mod
@@ -3,11 +3,12 @@ module github.com/ccfos/nightingale/v6
|
||||
go 1.18
|
||||
|
||||
require (
|
||||
github.com/BurntSushi/toml v1.3.2
|
||||
github.com/BurntSushi/toml v0.3.1
|
||||
github.com/coreos/go-oidc v2.2.1+incompatible
|
||||
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc
|
||||
github.com/dgrijalva/jwt-go v3.2.0+incompatible
|
||||
github.com/expr-lang/expr v1.16.1
|
||||
github.com/flashcatcloud/ibex v1.3.5
|
||||
github.com/gin-contrib/pprof v1.4.0
|
||||
github.com/gin-gonic/gin v1.9.1
|
||||
github.com/go-ldap/ldap/v3 v3.4.4
|
||||
@@ -17,7 +18,6 @@ require (
|
||||
github.com/golang/snappy v0.0.4
|
||||
github.com/google/uuid v1.3.0
|
||||
github.com/hashicorp/go-version v1.6.0
|
||||
github.com/jinzhu/copier v0.4.0
|
||||
github.com/json-iterator/go v1.1.12
|
||||
github.com/koding/multiconfig v0.0.0-20171124222453-69c27309b2d7
|
||||
github.com/mailru/easyjson v0.7.7
|
||||
@@ -33,7 +33,6 @@ require (
|
||||
github.com/spaolacci/murmur3 v1.1.0
|
||||
github.com/tidwall/gjson v1.14.0
|
||||
github.com/toolkits/pkg v1.3.6
|
||||
github.com/urfave/cli/v2 v2.27.4
|
||||
golang.org/x/exp v0.0.0-20230713183714-613f0c0eb8a1
|
||||
golang.org/x/oauth2 v0.10.0
|
||||
gopkg.in/gomail.v2 v2.0.0-20160411212932-81ebce5c23df
|
||||
@@ -44,12 +43,6 @@ require (
|
||||
gorm.io/gorm v1.25.7-0.20240204074919-46816ad31dde
|
||||
)
|
||||
|
||||
require (
|
||||
github.com/cpuguy83/go-md2man/v2 v2.0.4 // indirect
|
||||
github.com/russross/blackfriday/v2 v2.1.0 // indirect
|
||||
github.com/xrash/smetrics v0.0.0-20240521201337-686a1a2994c1 // indirect
|
||||
)
|
||||
|
||||
require (
|
||||
github.com/Azure/go-ntlmssp v0.0.0-20220621081337-cb9428e4ac1e // indirect
|
||||
github.com/beorn7/perks v1.0.1 // indirect
|
||||
@@ -96,7 +89,7 @@ require (
|
||||
github.com/tidwall/match v1.1.1 // indirect
|
||||
github.com/tidwall/pretty v1.2.0 // indirect
|
||||
github.com/twitchyliquid64/golang-asm v0.15.1 // indirect
|
||||
github.com/ugorji/go/codec v1.2.11
|
||||
github.com/ugorji/go/codec v1.2.11 // indirect
|
||||
go.uber.org/atomic v1.11.0 // indirect
|
||||
go.uber.org/automaxprocs v1.5.2 // indirect
|
||||
golang.org/x/arch v0.3.0 // indirect
|
||||
|
||||
15
go.sum
15
go.sum
@@ -5,9 +5,8 @@ github.com/Azure/azure-sdk-for-go/sdk/internal v1.3.0 h1:sXr+ck84g/ZlZUOZiNELInm
|
||||
github.com/Azure/go-ntlmssp v0.0.0-20220621081337-cb9428e4ac1e h1:NeAW1fUYUEWhft7pkxDf6WoUvEZJ/uOKsvtpjLnn8MU=
|
||||
github.com/Azure/go-ntlmssp v0.0.0-20220621081337-cb9428e4ac1e/go.mod h1:chxPXzSsl7ZWRAuOIE23GDNzjWuZquvFlgA8xmpunjU=
|
||||
github.com/AzureAD/microsoft-authentication-library-for-go v1.0.0 h1:OBhqkivkhkMqLPymWEppkm7vgPQY2XsHoEkaMQ0AdZY=
|
||||
github.com/BurntSushi/toml v0.3.1 h1:WXkYYl6Yr3qBf1K79EBnL4mak0OimBfB0XUf9Vl28OQ=
|
||||
github.com/BurntSushi/toml v0.3.1/go.mod h1:xHWCNGjB5oqiDr8zfno3MHue2Ht5sIBksp03qcyfWMU=
|
||||
github.com/BurntSushi/toml v1.3.2 h1:o7IhLm0Msx3BaB+n3Ag7L8EVlByGnpq14C4YWiu/gL8=
|
||||
github.com/BurntSushi/toml v1.3.2/go.mod h1:CxXYINrC8qIiEnFrOxCa7Jy5BFHlXnUU2pbicEuybxQ=
|
||||
github.com/Masterminds/semver/v3 v3.1.1 h1:hLg3sBzpNErnxhQtUy/mmLR2I9foDujNK030IGemrRc=
|
||||
github.com/Masterminds/semver/v3 v3.1.1/go.mod h1:VPu/7SZ7ePZ3QOrcuXROw5FAcLl4a0cBrbBpGY/8hQs=
|
||||
github.com/alecthomas/units v0.0.0-20211218093645-b94a6e3cc137 h1:s6gZFSlWYmbqAuRjVTiNNhvNRfY2Wxp9nhfyel4rklc=
|
||||
@@ -30,8 +29,6 @@ github.com/coreos/go-oidc v2.2.1+incompatible h1:mh48q/BqXqgjVHpy2ZY7WnWAbenxRjs
|
||||
github.com/coreos/go-oidc v2.2.1+incompatible/go.mod h1:CgnwVTmzoESiwO9qyAFEMiHoZ1nMCKZlZ9V6mm3/LKc=
|
||||
github.com/coreos/go-systemd v0.0.0-20190321100706-95778dfbb74e/go.mod h1:F5haX7vjVVG0kc13fIWeqUViNPyEJxv/OmvnBo0Yme4=
|
||||
github.com/coreos/go-systemd v0.0.0-20190719114852-fd7a80b32e1f/go.mod h1:F5haX7vjVVG0kc13fIWeqUViNPyEJxv/OmvnBo0Yme4=
|
||||
github.com/cpuguy83/go-md2man/v2 v2.0.4 h1:wfIWP927BUkWJb2NmU/kNDYIBTh/ziUX91+lVfRxZq4=
|
||||
github.com/cpuguy83/go-md2man/v2 v2.0.4/go.mod h1:tgQtvFlXSQOSOSIRvRPT7W67SCa46tRHOmNcaadrF8o=
|
||||
github.com/creack/pty v1.1.7/go.mod h1:lj5s0c3V2DBrqTV7llrYr5NG6My20zk30Fl46Y7DoTY=
|
||||
github.com/creack/pty v1.1.9/go.mod h1:oKZEueFk5CKHvIhNR5MUki03XCEU+Q6VDXinZuGJ33E=
|
||||
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
|
||||
@@ -50,6 +47,8 @@ github.com/fatih/camelcase v1.0.0 h1:hxNvNX/xYBp0ovncs8WyWZrOrpBNub/JfaMvbURyft8
|
||||
github.com/fatih/camelcase v1.0.0/go.mod h1:yN2Sb0lFhZJUdVvtELVWefmrXpuZESvPmqwoZc+/fpc=
|
||||
github.com/fatih/structs v1.1.0 h1:Q7juDM0QtcnhCpeyLGQKyg4TOIghuNXrkL32pHAUMxo=
|
||||
github.com/fatih/structs v1.1.0/go.mod h1:9NiDSp5zOcgEDl+j00MP/WkGVPOlPRLejGD8Ga6PJ7M=
|
||||
github.com/flashcatcloud/ibex v1.3.5 h1:8GOOf5+aJT0TP/MC6izz7CO5JKJSdKVFBwL0vQp93Nc=
|
||||
github.com/flashcatcloud/ibex v1.3.5/go.mod h1:T8hbMUySK2q6cXUaYp0AUVeKkU9Od2LjzwmB5lmTRBM=
|
||||
github.com/gabriel-vasile/mimetype v1.4.2 h1:w5qFW6JKBz9Y393Y4q372O9A7cUSequkh1Q7OhCmWKU=
|
||||
github.com/gabriel-vasile/mimetype v1.4.2/go.mod h1:zApsH/mKG4w07erKIaJPFiX0Tsq9BFQgN3qGY5GnNgA=
|
||||
github.com/garyburd/redigo v1.6.2/go.mod h1:NR3MbYisc3/PwhQ00EMzDiPmrwpPxAn5GI05/YaO1SY=
|
||||
@@ -161,8 +160,6 @@ github.com/jackc/puddle v0.0.0-20190413234325-e4ced69a3a2b/go.mod h1:m4B5Dj62Y0f
|
||||
github.com/jackc/puddle v0.0.0-20190608224051-11cab39313c9/go.mod h1:m4B5Dj62Y0fbyuIc15OsIqK0+JU8nkqQjsgx7dvjSWk=
|
||||
github.com/jackc/puddle v1.1.3/go.mod h1:m4B5Dj62Y0fbyuIc15OsIqK0+JU8nkqQjsgx7dvjSWk=
|
||||
github.com/jackc/puddle v1.3.0/go.mod h1:m4B5Dj62Y0fbyuIc15OsIqK0+JU8nkqQjsgx7dvjSWk=
|
||||
github.com/jinzhu/copier v0.4.0 h1:w3ciUoD19shMCRargcpm0cm91ytaBhDvuRpz1ODO/U8=
|
||||
github.com/jinzhu/copier v0.4.0/go.mod h1:DfbEm0FYsaqBcKcFuvmOZb218JkPGtvSHsKg8S8hyyg=
|
||||
github.com/jinzhu/inflection v1.0.0 h1:K317FqzuhWc8YvSVlFMCCUb36O/S9MCKRDI7QkRKD/E=
|
||||
github.com/jinzhu/inflection v1.0.0/go.mod h1:h+uFLlag+Qp1Va5pdKtLDYj+kHp5pxUVkryuEj+Srlc=
|
||||
github.com/jinzhu/now v1.1.4/go.mod h1:d3SSVoowX0Lcu0IBviAWJpolVfI5UJVZZ7cO71lE/z8=
|
||||
@@ -263,8 +260,6 @@ github.com/rogpeppe/go-internal v1.10.0 h1:TMyTOH3F/DB16zRVcYyreMH6GnZZrwQVAoYjR
|
||||
github.com/rs/xid v1.2.1/go.mod h1:+uKXf+4Djp6Md1KODXJxgGQPKngRmWyn10oCKFzNHOQ=
|
||||
github.com/rs/zerolog v1.13.0/go.mod h1:YbFCdg8HfsridGWAh22vktObvhZbQsZXe4/zB0OKkWU=
|
||||
github.com/rs/zerolog v1.15.0/go.mod h1:xYTKnLHcpfU2225ny5qZjxnj9NvkumZYjJHlAThCjNc=
|
||||
github.com/russross/blackfriday/v2 v2.1.0 h1:JIOH55/0cWyOuilr9/qlrm0BSXldqnqwMsf35Ld67mk=
|
||||
github.com/russross/blackfriday/v2 v2.1.0/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM=
|
||||
github.com/satori/go.uuid v1.2.0/go.mod h1:dA0hQrYB0VpLJoorglMZABFdXlWrHn1NEOzdhQKdks0=
|
||||
github.com/shopspring/decimal v0.0.0-20180709203117-cd690d0c9e24/go.mod h1:M+9NzErvs504Cn4c5DxATwIqPbtswREoFCre64PpcG4=
|
||||
github.com/shopspring/decimal v1.2.0 h1:abSATXmQEYyShuxI4/vyW3tV1MrKAJzCZ/0zLUXYbsQ=
|
||||
@@ -305,10 +300,6 @@ github.com/ugorji/go v1.2.7/go.mod h1:nF9osbDWLy6bDVv/Rtoh6QgnvNDpmCalQV5urGCCS6
|
||||
github.com/ugorji/go/codec v1.2.7/go.mod h1:WGN1fab3R1fzQlVQTkfxVtIBhWDRqOviHU95kRgeqEY=
|
||||
github.com/ugorji/go/codec v1.2.11 h1:BMaWp1Bb6fHwEtbplGBGJ498wD+LKlNSl25MjdZY4dU=
|
||||
github.com/ugorji/go/codec v1.2.11/go.mod h1:UNopzCgEMSXjBc6AOMqYvWC1ktqTAfzJZUZgYf6w6lg=
|
||||
github.com/urfave/cli/v2 v2.27.4 h1:o1owoI+02Eb+K107p27wEX9Bb8eqIoZCfLXloLUSWJ8=
|
||||
github.com/urfave/cli/v2 v2.27.4/go.mod h1:m4QzxcD2qpra4z7WhzEGn74WZLViBnMpb1ToCAKdGRQ=
|
||||
github.com/xrash/smetrics v0.0.0-20240521201337-686a1a2994c1 h1:gEOO8jv9F4OT7lGCjxCBTO/36wtF6j2nSip77qHd4x4=
|
||||
github.com/xrash/smetrics v0.0.0-20240521201337-686a1a2994c1/go.mod h1:Ohn+xnUBiLI6FVj/9LpzZWtj1/D6lUovWYBkxHVV3aM=
|
||||
github.com/yuin/goldmark v1.1.27/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74=
|
||||
github.com/yuin/goldmark v1.2.1/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74=
|
||||
github.com/yuin/goldmark v1.4.13/go.mod h1:6yULJ656Px+3vBD8DxQVa3kxgyrAnzto9xy5taEt/CY=
|
||||
|
||||
@@ -1,117 +0,0 @@
|
||||
package agentd
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"log"
|
||||
"os"
|
||||
"os/signal"
|
||||
"path/filepath"
|
||||
"syscall"
|
||||
|
||||
"github.com/toolkits/pkg/i18n"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/ibex/agentd/config"
|
||||
"github.com/ccfos/nightingale/v6/ibex/agentd/router"
|
||||
"github.com/ccfos/nightingale/v6/ibex/agentd/timer"
|
||||
"github.com/ccfos/nightingale/v6/pkg/httpx"
|
||||
)
|
||||
|
||||
type Agentd struct {
|
||||
ConfigFile string
|
||||
Version string
|
||||
}
|
||||
|
||||
type AgentdOption func(*Agentd)
|
||||
|
||||
func SetConfigFile(f string) AgentdOption {
|
||||
return func(s *Agentd) {
|
||||
s.ConfigFile = f
|
||||
}
|
||||
}
|
||||
|
||||
func SetVersion(v string) AgentdOption {
|
||||
return func(s *Agentd) {
|
||||
s.Version = v
|
||||
}
|
||||
}
|
||||
|
||||
// Run run agentd
|
||||
func Run(opts ...AgentdOption) {
|
||||
code := 1
|
||||
sc := make(chan os.Signal, 1)
|
||||
signal.Notify(sc, syscall.SIGHUP, syscall.SIGINT, syscall.SIGTERM, syscall.SIGQUIT)
|
||||
|
||||
agentd := Agentd{
|
||||
ConfigFile: filepath.Join("etc", "ibex", "agentd.toml"),
|
||||
Version: "not specified",
|
||||
}
|
||||
|
||||
for _, opt := range opts {
|
||||
opt(&agentd)
|
||||
}
|
||||
|
||||
cleanFunc, err := agentd.initialize()
|
||||
if err != nil {
|
||||
fmt.Println("agentd init fail:", err)
|
||||
os.Exit(code)
|
||||
}
|
||||
|
||||
EXIT:
|
||||
for {
|
||||
sig := <-sc
|
||||
fmt.Println("received signal:", sig.String())
|
||||
switch sig {
|
||||
case syscall.SIGQUIT, syscall.SIGTERM, syscall.SIGINT:
|
||||
code = 0
|
||||
break EXIT
|
||||
case syscall.SIGHUP:
|
||||
// reload configuration?
|
||||
default:
|
||||
break EXIT
|
||||
}
|
||||
}
|
||||
|
||||
cleanFunc()
|
||||
fmt.Println("agentd exited")
|
||||
os.Exit(code)
|
||||
}
|
||||
|
||||
func (s Agentd) initialize() (func(), error) {
|
||||
fns := Functions{}
|
||||
ctx, cancel := context.WithCancel(context.Background())
|
||||
fns.Add(cancel)
|
||||
|
||||
log.SetFlags(log.Ldate | log.Ltime | log.Lshortfile)
|
||||
|
||||
// parse config file
|
||||
config.MustLoad(s.ConfigFile)
|
||||
|
||||
// init i18n
|
||||
i18n.Init()
|
||||
|
||||
// init http server
|
||||
r := router.New(s.Version)
|
||||
httpClean := httpx.Init(config.C.HTTP, ctx, r)
|
||||
fns.Add(httpClean)
|
||||
|
||||
go timer.Heartbeat(ctx)
|
||||
|
||||
return fns.Ret(), nil
|
||||
}
|
||||
|
||||
type Functions struct {
|
||||
List []func()
|
||||
}
|
||||
|
||||
func (fs *Functions) Add(f func()) {
|
||||
fs.List = append(fs.List, f)
|
||||
}
|
||||
|
||||
func (fs *Functions) Ret() func() {
|
||||
return func() {
|
||||
for i := 0; i < len(fs.List); i++ {
|
||||
fs.List[i]()
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -1,110 +0,0 @@
|
||||
package client
|
||||
|
||||
import (
|
||||
"bufio"
|
||||
"io"
|
||||
"log"
|
||||
"net"
|
||||
"net/rpc"
|
||||
"reflect"
|
||||
"time"
|
||||
|
||||
"github.com/toolkits/pkg/net/gobrpc"
|
||||
"github.com/ugorji/go/codec"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/ibex/agentd/config"
|
||||
)
|
||||
|
||||
var cli *gobrpc.RPCClient
|
||||
|
||||
func getCli() *gobrpc.RPCClient {
|
||||
if cli != nil {
|
||||
return cli
|
||||
}
|
||||
|
||||
// detect the fastest server
|
||||
var (
|
||||
address string
|
||||
client *rpc.Client
|
||||
duration int64 = 999999999999
|
||||
)
|
||||
|
||||
// auto close other slow server
|
||||
acm := make(map[string]*rpc.Client)
|
||||
|
||||
l := len(config.C.Heartbeat.Servers)
|
||||
for i := 0; i < l; i++ {
|
||||
addr := config.C.Heartbeat.Servers[i]
|
||||
begin := time.Now()
|
||||
conn, err := net.DialTimeout("tcp", addr, time.Second*5)
|
||||
if err != nil {
|
||||
log.Printf("W: dial %s fail: %s", addr, err)
|
||||
continue
|
||||
}
|
||||
|
||||
var bufConn = struct {
|
||||
io.Closer
|
||||
*bufio.Reader
|
||||
*bufio.Writer
|
||||
}{conn, bufio.NewReader(conn), bufio.NewWriter(conn)}
|
||||
|
||||
var mh codec.MsgpackHandle
|
||||
mh.MapType = reflect.TypeOf(map[string]interface{}(nil))
|
||||
|
||||
rpcCodec := codec.MsgpackSpecRpc.ClientCodec(bufConn, &mh)
|
||||
c := rpc.NewClientWithCodec(rpcCodec)
|
||||
|
||||
acm[addr] = c
|
||||
|
||||
var out string
|
||||
err = c.Call("Server.Ping", "", &out)
|
||||
if err != nil {
|
||||
log.Printf("W: ping %s fail: %s", addr, err)
|
||||
continue
|
||||
}
|
||||
use := time.Since(begin).Nanoseconds()
|
||||
|
||||
if use < duration {
|
||||
address = addr
|
||||
client = c
|
||||
duration = use
|
||||
}
|
||||
}
|
||||
|
||||
if address == "" {
|
||||
log.Println("E: no job server found")
|
||||
return nil
|
||||
}
|
||||
|
||||
log.Printf("I: choose server: %s, duration: %dms", address, duration/1000000)
|
||||
|
||||
for addr, c := range acm {
|
||||
if addr == address {
|
||||
continue
|
||||
}
|
||||
c.Close()
|
||||
}
|
||||
|
||||
cli = gobrpc.NewRPCClient(address, client, 5*time.Second)
|
||||
return cli
|
||||
}
|
||||
|
||||
// GetCli 探测所有server端的延迟,自动选择最快的
|
||||
func GetCli() *gobrpc.RPCClient {
|
||||
for {
|
||||
c := getCli()
|
||||
if c != nil {
|
||||
return c
|
||||
}
|
||||
|
||||
time.Sleep(time.Second * 10)
|
||||
}
|
||||
}
|
||||
|
||||
// CloseCli 关闭客户端连接
|
||||
func CloseCli() {
|
||||
if cli != nil {
|
||||
cli.Close()
|
||||
cli = nil
|
||||
}
|
||||
}
|
||||
@@ -1,31 +0,0 @@
|
||||
package client
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"log"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/ibex/types"
|
||||
)
|
||||
|
||||
// Meta 从Server端获取任务元信息
|
||||
func Meta(id int64) (script string, args string, account string, stdin string, err error) {
|
||||
var resp types.TaskMetaResponse
|
||||
err = GetCli().Call("Server.GetTaskMeta", id, &resp)
|
||||
if err != nil {
|
||||
log.Println("E: rpc call Server.GetTaskMeta:", err)
|
||||
CloseCli()
|
||||
return
|
||||
}
|
||||
|
||||
if resp.Message != "" {
|
||||
log.Println("E: rpc call Server.GetTaskMeta:", resp.Message)
|
||||
err = fmt.Errorf(resp.Message)
|
||||
return
|
||||
}
|
||||
|
||||
script = resp.Script
|
||||
args = resp.Args
|
||||
account = resp.Account
|
||||
stdin = resp.Stdin
|
||||
return
|
||||
}
|
||||
@@ -1,140 +0,0 @@
|
||||
package config
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"log"
|
||||
"net"
|
||||
"os"
|
||||
"strings"
|
||||
"sync"
|
||||
|
||||
"github.com/koding/multiconfig"
|
||||
"github.com/toolkits/pkg/file"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/pkg/httpx"
|
||||
)
|
||||
|
||||
var (
|
||||
C = new(Config)
|
||||
once sync.Once
|
||||
)
|
||||
|
||||
func MustLoad(fpaths ...string) {
|
||||
once.Do(func() {
|
||||
loaders := []multiconfig.Loader{
|
||||
&multiconfig.TagLoader{},
|
||||
&multiconfig.EnvironmentLoader{},
|
||||
}
|
||||
|
||||
for _, fpath := range fpaths {
|
||||
handled := false
|
||||
|
||||
if strings.HasSuffix(fpath, "toml") {
|
||||
loaders = append(loaders, &multiconfig.TOMLLoader{Path: fpath})
|
||||
handled = true
|
||||
}
|
||||
if strings.HasSuffix(fpath, "conf") {
|
||||
loaders = append(loaders, &multiconfig.TOMLLoader{Path: fpath})
|
||||
handled = true
|
||||
}
|
||||
if strings.HasSuffix(fpath, "json") {
|
||||
loaders = append(loaders, &multiconfig.JSONLoader{Path: fpath})
|
||||
handled = true
|
||||
}
|
||||
if strings.HasSuffix(fpath, "yaml") {
|
||||
loaders = append(loaders, &multiconfig.YAMLLoader{Path: fpath})
|
||||
handled = true
|
||||
}
|
||||
|
||||
if !handled {
|
||||
fmt.Println("config file invalid, valid file exts: .conf,.yaml,.toml,.json")
|
||||
os.Exit(1)
|
||||
}
|
||||
}
|
||||
|
||||
m := multiconfig.DefaultLoader{
|
||||
Loader: multiconfig.MultiLoader(loaders...),
|
||||
Validator: multiconfig.MultiValidator(&multiconfig.RequiredValidator{}),
|
||||
}
|
||||
|
||||
m.MustLoad(C)
|
||||
|
||||
if C.Heartbeat.Host == "" {
|
||||
fmt.Println("heartbeat.host is blank")
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
if C.Heartbeat.Host == "$ip" {
|
||||
C.Heartbeat.Endpoint = fmt.Sprint(GetOutboundIP())
|
||||
if C.Heartbeat.Endpoint == "" {
|
||||
fmt.Println("ip auto got is blank")
|
||||
os.Exit(1)
|
||||
}
|
||||
fmt.Println("host.ip:", C.Heartbeat.Endpoint)
|
||||
}
|
||||
|
||||
host, err := C.GetHost()
|
||||
if err != nil {
|
||||
log.Println("E: failed to GetHost:", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
fmt.Println("host:", host)
|
||||
|
||||
if C.MetaDir == "" {
|
||||
C.MetaDir = "./meta"
|
||||
}
|
||||
|
||||
C.MetaDir, err = file.RealPath(C.MetaDir)
|
||||
if err != nil {
|
||||
log.Println("E: failed to get real path of MetaDir:", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
file.EnsureDir(C.MetaDir)
|
||||
file.EnsureDirRW(C.MetaDir)
|
||||
})
|
||||
}
|
||||
|
||||
type Config struct {
|
||||
RunMode string
|
||||
MetaDir string
|
||||
Heartbeat Heartbeat
|
||||
HTTP httpx.Config
|
||||
}
|
||||
|
||||
type Heartbeat struct {
|
||||
Interval int64
|
||||
Servers []string
|
||||
Host string
|
||||
Endpoint string
|
||||
}
|
||||
|
||||
func (c *Config) IsDebugMode() bool {
|
||||
return c.RunMode == "debug"
|
||||
}
|
||||
|
||||
func (c *Config) GetHost() (string, error) {
|
||||
if c.Heartbeat.Host == "$ip" {
|
||||
return c.Heartbeat.Endpoint, nil
|
||||
}
|
||||
|
||||
if c.Heartbeat.Host == "$hostname" {
|
||||
return os.Hostname()
|
||||
}
|
||||
|
||||
return c.Heartbeat.Host, nil
|
||||
}
|
||||
|
||||
// Get preferred outbound ip of this machine
|
||||
func GetOutboundIP() net.IP {
|
||||
conn, err := net.Dial("udp", "8.8.8.8:80")
|
||||
if err != nil {
|
||||
fmt.Println("auto get outbound ip fail:", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
defer conn.Close()
|
||||
|
||||
localAddr := conn.LocalAddr().(*net.UDPAddr)
|
||||
|
||||
return localAddr.IP
|
||||
}
|
||||
@@ -1,60 +0,0 @@
|
||||
package router
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"os"
|
||||
"strings"
|
||||
|
||||
"github.com/gin-contrib/pprof"
|
||||
"github.com/gin-gonic/gin"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/ibex/agentd/config"
|
||||
"github.com/ccfos/nightingale/v6/pkg/aop"
|
||||
)
|
||||
|
||||
func New(version string) *gin.Engine {
|
||||
gin.SetMode(config.C.RunMode)
|
||||
|
||||
loggerMid := aop.Logger()
|
||||
recoveryMid := aop.Recovery()
|
||||
|
||||
if strings.ToLower(config.C.RunMode) == "release" {
|
||||
aop.DisableConsoleColor()
|
||||
}
|
||||
|
||||
r := gin.New()
|
||||
|
||||
r.Use(recoveryMid)
|
||||
|
||||
// whether print access log
|
||||
if config.C.HTTP.PrintAccessLog {
|
||||
r.Use(loggerMid)
|
||||
}
|
||||
|
||||
configRoute(r, version)
|
||||
|
||||
return r
|
||||
}
|
||||
|
||||
func configRoute(r *gin.Engine, version string) {
|
||||
if config.C.HTTP.PProf {
|
||||
pprof.Register(r, "/debug/pprof")
|
||||
}
|
||||
|
||||
r.GET("/ping", func(c *gin.Context) {
|
||||
c.String(200, "pong")
|
||||
})
|
||||
|
||||
r.GET("/pid", func(c *gin.Context) {
|
||||
c.String(200, fmt.Sprintf("%d", os.Getpid()))
|
||||
})
|
||||
|
||||
r.GET("/addr", func(c *gin.Context) {
|
||||
c.String(200, c.Request.RemoteAddr)
|
||||
})
|
||||
|
||||
r.GET("/version", func(c *gin.Context) {
|
||||
c.String(200, version)
|
||||
})
|
||||
|
||||
}
|
||||
@@ -1,18 +0,0 @@
|
||||
//go:build !windows
|
||||
// +build !windows
|
||||
|
||||
package timer
|
||||
|
||||
import (
|
||||
"os/exec"
|
||||
"syscall"
|
||||
)
|
||||
|
||||
func CmdStart(cmd *exec.Cmd) error {
|
||||
cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}
|
||||
return cmd.Start()
|
||||
}
|
||||
|
||||
func CmdKill(cmd *exec.Cmd) error {
|
||||
return syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL)
|
||||
}
|
||||
@@ -1,16 +0,0 @@
|
||||
//go:build windows
|
||||
// +build windows
|
||||
|
||||
package timer
|
||||
|
||||
import (
|
||||
"os/exec"
|
||||
)
|
||||
|
||||
func CmdStart(cmd *exec.Cmd) error {
|
||||
return cmd.Start()
|
||||
}
|
||||
|
||||
func CmdKill(cmd *exec.Cmd) error {
|
||||
return cmd.Process.Kill()
|
||||
}
|
||||
@@ -1,74 +0,0 @@
|
||||
package timer
|
||||
|
||||
import (
|
||||
"context"
|
||||
"log"
|
||||
"time"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/ibex/agentd/client"
|
||||
"github.com/ccfos/nightingale/v6/ibex/agentd/config"
|
||||
"github.com/ccfos/nightingale/v6/ibex/types"
|
||||
)
|
||||
|
||||
func Heartbeat(ctx context.Context) {
|
||||
interval := time.Duration(config.C.Heartbeat.Interval) * time.Millisecond
|
||||
for {
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
return
|
||||
case <-time.After(interval):
|
||||
heartbeat()
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func heartbeat() {
|
||||
ident, err := config.C.GetHost()
|
||||
if err != nil {
|
||||
log.Println("E: GetHost fail:", err)
|
||||
return
|
||||
}
|
||||
|
||||
req := types.ReportRequest{
|
||||
Ident: ident,
|
||||
ReportTasks: Locals.ReportTasks(),
|
||||
}
|
||||
|
||||
var resp types.ReportResponse
|
||||
err = client.GetCli().Call("Server.Report", req, &resp)
|
||||
if err != nil {
|
||||
log.Println("E: rpc call Server.Report fail:", err)
|
||||
client.CloseCli()
|
||||
return
|
||||
}
|
||||
|
||||
if resp.Message != "" {
|
||||
log.Println("E: error from server:", resp.Message)
|
||||
return
|
||||
}
|
||||
|
||||
assigned := make(map[int64]struct{})
|
||||
|
||||
if resp.AssignTasks != nil {
|
||||
count := len(resp.AssignTasks)
|
||||
for i := 0; i < count; i++ {
|
||||
at := resp.AssignTasks[i]
|
||||
assigned[at.Id] = struct{}{}
|
||||
Locals.AssignTask(at)
|
||||
}
|
||||
}
|
||||
|
||||
if len(assigned) > 0 {
|
||||
log.Println("D: assigned tasks:", mapKeys(assigned))
|
||||
}
|
||||
|
||||
Locals.Clean(assigned)
|
||||
}
|
||||
|
||||
func mapKeys(m map[int64]struct{}) []int64 {
|
||||
lst := make([]int64, 0, len(m))
|
||||
for k := range m {
|
||||
lst = append(lst, k)
|
||||
}
|
||||
return lst
|
||||
}
|
||||
@@ -1,333 +0,0 @@
|
||||
package timer
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"fmt"
|
||||
"log"
|
||||
"os/exec"
|
||||
"os/user"
|
||||
"path"
|
||||
"strings"
|
||||
"sync"
|
||||
|
||||
"github.com/toolkits/pkg/file"
|
||||
"github.com/toolkits/pkg/runner"
|
||||
"github.com/toolkits/pkg/sys"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/ibex/agentd/client"
|
||||
"github.com/ccfos/nightingale/v6/ibex/agentd/config"
|
||||
)
|
||||
|
||||
type Task struct {
|
||||
sync.Mutex
|
||||
|
||||
Id int64
|
||||
Clock int64
|
||||
Action string
|
||||
Status string
|
||||
|
||||
alive bool
|
||||
Cmd *exec.Cmd
|
||||
Stdout bytes.Buffer
|
||||
Stderr bytes.Buffer
|
||||
Stdin *bytes.Reader
|
||||
|
||||
Args string
|
||||
Account string
|
||||
StdinStr string
|
||||
}
|
||||
|
||||
func (t *Task) SetStatus(status string) {
|
||||
t.Lock()
|
||||
t.Status = status
|
||||
t.Unlock()
|
||||
}
|
||||
|
||||
func (t *Task) GetStatus() string {
|
||||
t.Lock()
|
||||
s := t.Status
|
||||
t.Unlock()
|
||||
return s
|
||||
}
|
||||
|
||||
func (t *Task) GetAlive() bool {
|
||||
t.Lock()
|
||||
pa := t.alive
|
||||
t.Unlock()
|
||||
return pa
|
||||
}
|
||||
|
||||
func (t *Task) SetAlive(pa bool) {
|
||||
t.Lock()
|
||||
t.alive = pa
|
||||
t.Unlock()
|
||||
}
|
||||
|
||||
func (t *Task) GetStdout() string {
|
||||
t.Lock()
|
||||
out := t.Stdout.String()
|
||||
t.Unlock()
|
||||
return out
|
||||
}
|
||||
|
||||
func (t *Task) GetStderr() string {
|
||||
t.Lock()
|
||||
out := t.Stderr.String()
|
||||
t.Unlock()
|
||||
return out
|
||||
}
|
||||
|
||||
func (t *Task) ResetBuff() {
|
||||
t.Lock()
|
||||
t.Stdout.Reset()
|
||||
t.Stderr.Reset()
|
||||
t.Unlock()
|
||||
}
|
||||
|
||||
func (t *Task) doneBefore() bool {
|
||||
doneFlag := path.Join(config.C.MetaDir, fmt.Sprint(t.Id), fmt.Sprintf("%d.done", t.Clock))
|
||||
return file.IsExist(doneFlag)
|
||||
}
|
||||
|
||||
func (t *Task) loadResult() {
|
||||
metadir := config.C.MetaDir
|
||||
|
||||
doneFlag := path.Join(metadir, fmt.Sprint(t.Id), fmt.Sprintf("%d.done", t.Clock))
|
||||
stdoutFile := path.Join(metadir, fmt.Sprint(t.Id), "stdout")
|
||||
stderrFile := path.Join(metadir, fmt.Sprint(t.Id), "stderr")
|
||||
|
||||
var err error
|
||||
|
||||
t.Status, err = file.ReadStringTrim(doneFlag)
|
||||
if err != nil {
|
||||
log.Printf("E: read file %s fail %v", doneFlag, err)
|
||||
}
|
||||
stdout, err := file.ReadString(stdoutFile)
|
||||
if err != nil {
|
||||
log.Printf("E: read file %s fail %v", stdoutFile, err)
|
||||
}
|
||||
stderr, err := file.ReadString(stderrFile)
|
||||
if err != nil {
|
||||
log.Printf("E: read file %s fail %v", stderrFile, err)
|
||||
}
|
||||
|
||||
t.Stdout = *bytes.NewBufferString(stdout)
|
||||
t.Stderr = *bytes.NewBufferString(stderr)
|
||||
}
|
||||
|
||||
func (t *Task) prepare() error {
|
||||
if t.Account != "" {
|
||||
// already prepared
|
||||
return nil
|
||||
}
|
||||
|
||||
IdDir := path.Join(config.C.MetaDir, fmt.Sprint(t.Id))
|
||||
err := file.EnsureDir(IdDir)
|
||||
if err != nil {
|
||||
log.Printf("E: mkdir -p %s fail: %v", IdDir, err)
|
||||
return err
|
||||
}
|
||||
|
||||
writeFlag := path.Join(IdDir, ".write")
|
||||
if file.IsExist(writeFlag) {
|
||||
// 从磁盘读取
|
||||
argsFile := path.Join(IdDir, "args")
|
||||
args, err := file.ReadStringTrim(argsFile)
|
||||
if err != nil {
|
||||
log.Printf("E: read %s fail %v", argsFile, err)
|
||||
return err
|
||||
}
|
||||
|
||||
accountFile := path.Join(IdDir, "account")
|
||||
account, err := file.ReadStringTrim(accountFile)
|
||||
if err != nil {
|
||||
log.Printf("E: read %s fail %v", accountFile, err)
|
||||
return err
|
||||
}
|
||||
|
||||
stdinFile := path.Join(IdDir, "stdin")
|
||||
stdin, err := file.ReadStringTrim(stdinFile)
|
||||
if err != nil {
|
||||
log.Printf("E: read %s fail %v", stdinFile, err)
|
||||
return err
|
||||
}
|
||||
|
||||
t.Args = args
|
||||
t.Account = account
|
||||
t.StdinStr = stdin
|
||||
|
||||
} else {
|
||||
// 从远端读取,再写入磁盘
|
||||
script, args, account, stdin, err := client.Meta(t.Id)
|
||||
if err != nil {
|
||||
log.Println("E: query task meta fail:", err)
|
||||
return err
|
||||
}
|
||||
|
||||
scriptFile := path.Join(IdDir, "script")
|
||||
_, err = file.WriteString(scriptFile, script)
|
||||
if err != nil {
|
||||
log.Printf("E: write script to %s fail: %v", scriptFile, err)
|
||||
return err
|
||||
}
|
||||
|
||||
out, err := sys.CmdOutTrim("chmod", "+x", scriptFile)
|
||||
if err != nil {
|
||||
log.Printf("E: chmod +x %s fail %v. output: %s", scriptFile, err, out)
|
||||
return err
|
||||
}
|
||||
|
||||
argsFile := path.Join(IdDir, "args")
|
||||
_, err = file.WriteString(argsFile, args)
|
||||
if err != nil {
|
||||
log.Printf("E: write args to %s fail: %v", argsFile, err)
|
||||
return err
|
||||
}
|
||||
|
||||
accountFile := path.Join(IdDir, "account")
|
||||
_, err = file.WriteString(accountFile, account)
|
||||
if err != nil {
|
||||
log.Printf("E: write account to %s fail: %v", accountFile, err)
|
||||
return err
|
||||
}
|
||||
|
||||
stdinFile := path.Join(IdDir, "stdin")
|
||||
_, err = file.WriteString(stdinFile, stdin)
|
||||
if err != nil {
|
||||
log.Printf("E: write tags to %s fail: %v", stdinFile, err)
|
||||
return err
|
||||
}
|
||||
|
||||
_, err = file.WriteString(writeFlag, "")
|
||||
if err != nil {
|
||||
log.Printf("E: create %s flag file fail: %v", writeFlag, err)
|
||||
return err
|
||||
}
|
||||
|
||||
t.Args = args
|
||||
t.Account = account
|
||||
t.StdinStr = stdin
|
||||
}
|
||||
|
||||
t.Stdin = bytes.NewReader([]byte(t.StdinStr))
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
func (t *Task) start() {
|
||||
if t.GetAlive() {
|
||||
return
|
||||
}
|
||||
|
||||
err := t.prepare()
|
||||
if err != nil {
|
||||
return
|
||||
}
|
||||
|
||||
args := t.Args
|
||||
if args != "" {
|
||||
args = strings.Replace(args, ",,", "' '", -1)
|
||||
args = "'" + args + "'"
|
||||
}
|
||||
|
||||
scriptFile := path.Join(config.C.MetaDir, fmt.Sprint(t.Id), "script")
|
||||
if !path.IsAbs(scriptFile) {
|
||||
scriptFile = path.Join(runner.Cwd, scriptFile)
|
||||
}
|
||||
|
||||
sh := fmt.Sprintf("%s %s", scriptFile, args)
|
||||
var cmd *exec.Cmd
|
||||
|
||||
loginUser, err := user.Current()
|
||||
if err != nil {
|
||||
log.Println("E: cannot get current login user:", err)
|
||||
return
|
||||
}
|
||||
|
||||
if loginUser.Username == "root" {
|
||||
// current login user is root
|
||||
if t.Account == "root" {
|
||||
cmd = exec.Command("sh", "-c", sh)
|
||||
cmd.Dir = loginUser.HomeDir
|
||||
} else {
|
||||
cmd = exec.Command("su", "-c", sh, "-", t.Account)
|
||||
}
|
||||
} else {
|
||||
// current login user not root
|
||||
cmd = exec.Command("sh", "-c", sh)
|
||||
cmd.Dir = loginUser.HomeDir
|
||||
}
|
||||
|
||||
cmd.Stdout = &t.Stdout
|
||||
cmd.Stderr = &t.Stderr
|
||||
cmd.Stdin = t.Stdin
|
||||
|
||||
t.Cmd = cmd
|
||||
|
||||
err = CmdStart(cmd)
|
||||
if err != nil {
|
||||
log.Printf("E: cannot start cmd of task[%d]: %v", t.Id, err)
|
||||
return
|
||||
}
|
||||
|
||||
go runProcess(t)
|
||||
}
|
||||
|
||||
func (t *Task) kill() {
|
||||
go killProcess(t)
|
||||
}
|
||||
|
||||
func runProcess(t *Task) {
|
||||
t.SetAlive(true)
|
||||
defer t.SetAlive(false)
|
||||
|
||||
err := t.Cmd.Wait()
|
||||
if err != nil {
|
||||
if strings.Contains(err.Error(), "signal: killed") {
|
||||
t.SetStatus("killed")
|
||||
log.Printf("D: process of task[%d] killed", t.Id)
|
||||
} else if strings.Contains(err.Error(), "signal: terminated") {
|
||||
// kill children process manually
|
||||
t.SetStatus("killed")
|
||||
log.Printf("D: process of task[%d] terminated", t.Id)
|
||||
} else {
|
||||
t.SetStatus("failed")
|
||||
log.Printf("D: process of task[%d] return error: %v", t.Id, err)
|
||||
}
|
||||
} else {
|
||||
t.SetStatus("success")
|
||||
log.Printf("D: process of task[%d] done", t.Id)
|
||||
}
|
||||
|
||||
persistResult(t)
|
||||
}
|
||||
|
||||
func persistResult(t *Task) {
|
||||
metadir := config.C.MetaDir
|
||||
|
||||
stdout := path.Join(metadir, fmt.Sprint(t.Id), "stdout")
|
||||
stderr := path.Join(metadir, fmt.Sprint(t.Id), "stderr")
|
||||
doneFlag := path.Join(metadir, fmt.Sprint(t.Id), fmt.Sprintf("%d.done", t.Clock))
|
||||
|
||||
file.WriteString(stdout, t.GetStdout())
|
||||
file.WriteString(stderr, t.GetStderr())
|
||||
file.WriteString(doneFlag, t.GetStatus())
|
||||
}
|
||||
|
||||
func killProcess(t *Task) {
|
||||
t.SetAlive(true)
|
||||
defer t.SetAlive(false)
|
||||
|
||||
log.Printf("D: begin kill process of task[%d]", t.Id)
|
||||
|
||||
err := CmdKill(t.Cmd)
|
||||
if err != nil {
|
||||
t.SetStatus("killfailed")
|
||||
log.Printf("D: kill process of task[%d] fail: %v", t.Id, err)
|
||||
} else {
|
||||
t.SetStatus("killed")
|
||||
log.Printf("D: process of task[%d] killed", t.Id)
|
||||
}
|
||||
|
||||
persistResult(t)
|
||||
}
|
||||
@@ -1,120 +0,0 @@
|
||||
package timer
|
||||
|
||||
import (
|
||||
"log"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/ibex/types"
|
||||
)
|
||||
|
||||
type LocalTasksT struct {
|
||||
M map[int64]*Task
|
||||
}
|
||||
|
||||
var Locals = &LocalTasksT{M: make(map[int64]*Task)}
|
||||
|
||||
func (lt *LocalTasksT) ReportTasks() []types.ReportTask {
|
||||
ret := make([]types.ReportTask, 0, len(lt.M))
|
||||
for id, t := range lt.M {
|
||||
rt := types.ReportTask{Id: id, Clock: t.Clock}
|
||||
|
||||
rt.Status = t.GetStatus()
|
||||
if rt.Status == "running" || rt.Status == "killing" {
|
||||
// intermediate state
|
||||
continue
|
||||
}
|
||||
|
||||
rt.Stdout = t.GetStdout()
|
||||
rt.Stderr = t.GetStderr()
|
||||
|
||||
stdoutLen := len(rt.Stdout)
|
||||
stderrLen := len(rt.Stderr)
|
||||
|
||||
// 输出太长的话,截断,要不然把数据库撑爆了
|
||||
if stdoutLen > 65535 {
|
||||
start := stdoutLen - 65535
|
||||
rt.Stdout = rt.Stdout[start:]
|
||||
}
|
||||
|
||||
if stderrLen > 65535 {
|
||||
start := stderrLen - 65535
|
||||
rt.Stderr = rt.Stderr[start:]
|
||||
}
|
||||
|
||||
ret = append(ret, rt)
|
||||
}
|
||||
|
||||
return ret
|
||||
}
|
||||
|
||||
func (lt *LocalTasksT) GetTask(id int64) (*Task, bool) {
|
||||
t, found := lt.M[id]
|
||||
return t, found
|
||||
}
|
||||
|
||||
func (lt *LocalTasksT) SetTask(t *Task) {
|
||||
lt.M[t.Id] = t
|
||||
}
|
||||
|
||||
func (lt *LocalTasksT) AssignTask(at types.AssignTask) {
|
||||
local, found := lt.GetTask(at.Id)
|
||||
if found {
|
||||
if local.Clock == at.Clock && local.Action == at.Action {
|
||||
// ignore repeat task
|
||||
return
|
||||
}
|
||||
|
||||
local.Clock = at.Clock
|
||||
local.Action = at.Action
|
||||
} else {
|
||||
if at.Action == "kill" {
|
||||
// no process in local, no need kill
|
||||
return
|
||||
}
|
||||
local = &Task{
|
||||
Id: at.Id,
|
||||
Clock: at.Clock,
|
||||
Action: at.Action,
|
||||
}
|
||||
lt.SetTask(local)
|
||||
|
||||
if local.doneBefore() {
|
||||
local.loadResult()
|
||||
return
|
||||
}
|
||||
}
|
||||
|
||||
if local.Action == "kill" {
|
||||
local.SetStatus("killing")
|
||||
local.kill()
|
||||
} else if local.Action == "start" {
|
||||
local.SetStatus("running")
|
||||
local.start()
|
||||
} else {
|
||||
log.Printf("W: unknown action: %s of task %d", at.Action, at.Id)
|
||||
}
|
||||
}
|
||||
|
||||
func (lt *LocalTasksT) Clean(assigned map[int64]struct{}) {
|
||||
del := make(map[int64]struct{})
|
||||
|
||||
for id := range lt.M {
|
||||
if _, found := assigned[id]; !found {
|
||||
del[id] = struct{}{}
|
||||
}
|
||||
}
|
||||
|
||||
for id := range del {
|
||||
// 远端已经不关注这个任务了,但是本地来看,任务还是running的
|
||||
// 可能是远端认为超时了,此时本地不能删除,仍然要继续上报
|
||||
if lt.M[id].GetStatus() == "running" {
|
||||
continue
|
||||
}
|
||||
|
||||
lt.M[id].ResetBuff()
|
||||
cmd := lt.M[id].Cmd
|
||||
delete(lt.M, id)
|
||||
if cmd != nil && cmd.Process != nil {
|
||||
cmd.Process.Release()
|
||||
}
|
||||
}
|
||||
}
|
||||
82
ibex/ibex.go
82
ibex/ibex.go
@@ -1,82 +0,0 @@
|
||||
package ibex
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
"os"
|
||||
"strings"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/ibex/server/config"
|
||||
"github.com/ccfos/nightingale/v6/ibex/server/router"
|
||||
"github.com/ccfos/nightingale/v6/ibex/server/rpc"
|
||||
"github.com/ccfos/nightingale/v6/ibex/server/timer"
|
||||
"github.com/ccfos/nightingale/v6/storage"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/alert/aconf"
|
||||
n9eRouter "github.com/ccfos/nightingale/v6/center/router"
|
||||
"github.com/ccfos/nightingale/v6/conf"
|
||||
n9eConf "github.com/ccfos/nightingale/v6/conf"
|
||||
"github.com/gin-gonic/gin"
|
||||
"github.com/redis/go-redis/v9"
|
||||
"gorm.io/gorm"
|
||||
)
|
||||
|
||||
var (
|
||||
HttpPort int
|
||||
)
|
||||
|
||||
func ServerStart(ctx *ctx.Context, isCenter bool, db *gorm.DB, rc redis.Cmdable, basicAuth gin.Accounts, heartbeat aconf.HeartbeatConfig,
|
||||
api *n9eConf.CenterApi, r *gin.Engine, centerRouter *n9eRouter.Router, ibex conf.Ibex, httpPort int) {
|
||||
config.C.IsCenter = isCenter
|
||||
config.C.BasicAuth = make(gin.Accounts)
|
||||
if len(basicAuth) > 0 {
|
||||
config.C.BasicAuth = basicAuth
|
||||
}
|
||||
|
||||
config.C.Heartbeat.IP = heartbeat.IP
|
||||
config.C.Heartbeat.Interval = heartbeat.Interval
|
||||
config.C.Heartbeat.LocalAddr = schedulerAddrGet(ibex.RPCListen)
|
||||
HttpPort = httpPort
|
||||
|
||||
config.C.Output.ComeFrom = ibex.Output.ComeFrom
|
||||
config.C.Output.AgtdPort = ibex.Output.AgtdPort
|
||||
|
||||
rou := router.NewRouter(ctx)
|
||||
|
||||
if centerRouter != nil {
|
||||
rou.ConfigRouter(r, centerRouter)
|
||||
} else {
|
||||
rou.ConfigRouter(r)
|
||||
}
|
||||
|
||||
ctx.Redis = rc
|
||||
if err := storage.IdInit(ctx.Redis); err != nil {
|
||||
fmt.Println("cannot init id generator: ", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
rpc.Start(ibex.RPCListen, ctx)
|
||||
|
||||
if isCenter {
|
||||
go timer.Heartbeat(ctx)
|
||||
go timer.Schedule(ctx)
|
||||
go timer.CleanLong(ctx)
|
||||
} else {
|
||||
config.C.CenterApi = *api
|
||||
}
|
||||
|
||||
timer.CacheHostDoing(ctx)
|
||||
timer.ReportResult(ctx)
|
||||
}
|
||||
|
||||
func schedulerAddrGet(rpcListen string) string {
|
||||
ip := fmt.Sprint(config.GetOutboundIP())
|
||||
if ip == "" {
|
||||
fmt.Println("heartbeat ip auto got is blank")
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
port := strings.Split(rpcListen, ":")[1]
|
||||
localAddr := ip + ":" + port
|
||||
return localAddr
|
||||
}
|
||||
@@ -1,135 +0,0 @@
|
||||
package config
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"net"
|
||||
"os"
|
||||
"strings"
|
||||
"sync"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/pkg/httpx"
|
||||
"github.com/ccfos/nightingale/v6/pkg/logx"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/conf"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ormx"
|
||||
"github.com/ccfos/nightingale/v6/storage"
|
||||
"github.com/gin-gonic/gin"
|
||||
"github.com/koding/multiconfig"
|
||||
)
|
||||
|
||||
var (
|
||||
C = new(Config)
|
||||
once sync.Once
|
||||
)
|
||||
|
||||
func MustLoad(fpaths ...string) {
|
||||
once.Do(func() {
|
||||
loaders := []multiconfig.Loader{
|
||||
&multiconfig.TagLoader{},
|
||||
&multiconfig.EnvironmentLoader{},
|
||||
}
|
||||
|
||||
for _, fpath := range fpaths {
|
||||
handled := false
|
||||
|
||||
if strings.HasSuffix(fpath, "toml") {
|
||||
loaders = append(loaders, &multiconfig.TOMLLoader{Path: fpath})
|
||||
handled = true
|
||||
}
|
||||
if strings.HasSuffix(fpath, "conf") {
|
||||
loaders = append(loaders, &multiconfig.TOMLLoader{Path: fpath})
|
||||
handled = true
|
||||
}
|
||||
if strings.HasSuffix(fpath, "json") {
|
||||
loaders = append(loaders, &multiconfig.JSONLoader{Path: fpath})
|
||||
handled = true
|
||||
}
|
||||
if strings.HasSuffix(fpath, "yaml") {
|
||||
loaders = append(loaders, &multiconfig.YAMLLoader{Path: fpath})
|
||||
handled = true
|
||||
}
|
||||
|
||||
if !handled {
|
||||
fmt.Println("config file invalid, valid file exts: .conf,.yaml,.toml,.json")
|
||||
os.Exit(1)
|
||||
}
|
||||
}
|
||||
|
||||
m := multiconfig.DefaultLoader{
|
||||
Loader: multiconfig.MultiLoader(loaders...),
|
||||
Validator: multiconfig.MultiValidator(&multiconfig.RequiredValidator{}),
|
||||
}
|
||||
|
||||
m.MustLoad(C)
|
||||
|
||||
if C.Heartbeat.IP == "" {
|
||||
// auto detect
|
||||
C.Heartbeat.IP = fmt.Sprint(GetOutboundIP())
|
||||
|
||||
if C.Heartbeat.IP == "" {
|
||||
fmt.Println("heartbeat ip auto got is blank")
|
||||
os.Exit(1)
|
||||
}
|
||||
}
|
||||
|
||||
port := strings.Split(C.RPC.Listen, ":")[1]
|
||||
endpoint := C.Heartbeat.IP + ":" + port
|
||||
C.Heartbeat.LocalAddr = endpoint
|
||||
|
||||
// 正常情况肯定不是127.0.0.1,但是,如果就是单机部署,并且这个机器没有网络,比如本地调试并且本机没网的时候
|
||||
// if C.Heartbeat.IP == "127.0.0.1" {
|
||||
// fmt.Println("heartbeat ip is 127.0.0.1 and it is useless, so, exit")
|
||||
// os.Exit(1)
|
||||
// }
|
||||
|
||||
fmt.Println("heartbeat.ip:", C.Heartbeat.IP)
|
||||
fmt.Printf("heartbeat.interval: %dms\n", C.Heartbeat.Interval)
|
||||
})
|
||||
}
|
||||
|
||||
type Config struct {
|
||||
RunMode string
|
||||
RPC RPC
|
||||
Heartbeat Heartbeat
|
||||
Output Output
|
||||
IsCenter bool
|
||||
CenterApi conf.CenterApi
|
||||
Log logx.Config
|
||||
HTTP httpx.Config
|
||||
BasicAuth gin.Accounts
|
||||
DB ormx.DBConfig
|
||||
Redis storage.RedisConfig
|
||||
}
|
||||
|
||||
type RPC struct {
|
||||
Listen string
|
||||
}
|
||||
|
||||
type Heartbeat struct {
|
||||
IP string
|
||||
Interval int64
|
||||
LocalAddr string
|
||||
}
|
||||
|
||||
type Output struct {
|
||||
ComeFrom string
|
||||
AgtdPort int
|
||||
}
|
||||
|
||||
func (c *Config) IsDebugMode() bool {
|
||||
return c.RunMode == "debug"
|
||||
}
|
||||
|
||||
// Get preferred outbound ip of this machine
|
||||
func GetOutboundIP() net.IP {
|
||||
conn, err := net.Dial("udp", "8.8.8.8:80")
|
||||
if err != nil {
|
||||
fmt.Println("auto get outbound ip fail:", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
defer conn.Close()
|
||||
|
||||
localAddr := conn.LocalAddr().(*net.UDPAddr)
|
||||
|
||||
return localAddr.IP
|
||||
}
|
||||
@@ -1,144 +0,0 @@
|
||||
package logic
|
||||
|
||||
import (
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
"github.com/toolkits/pkg/logger"
|
||||
"github.com/toolkits/pkg/slice"
|
||||
"github.com/toolkits/pkg/str"
|
||||
)
|
||||
|
||||
func ScheduleTask(ctx *ctx.Context, id int64) {
|
||||
logger.Debugf("task[%d] scheduling...", id)
|
||||
|
||||
count, err := models.WaitingHostCount(ctx, id)
|
||||
if err != nil {
|
||||
logger.Errorf("cannot get task[%d] waiting host count: %v", id, err)
|
||||
return
|
||||
}
|
||||
|
||||
if count == 0 {
|
||||
cleanDoneTask(ctx, id)
|
||||
return
|
||||
}
|
||||
|
||||
action, err := models.TaskActionGet(ctx, "id=?", id)
|
||||
if err != nil {
|
||||
logger.Errorf("cannot get task[%d] action: %v", id, err)
|
||||
return
|
||||
}
|
||||
|
||||
if action == nil {
|
||||
logger.Errorf("[W] no action found of task[%d]", id)
|
||||
return
|
||||
}
|
||||
|
||||
switch action.Action {
|
||||
case "start":
|
||||
startTask(ctx, id, action)
|
||||
case "pause":
|
||||
return
|
||||
case "cancel":
|
||||
return
|
||||
case "kill":
|
||||
return
|
||||
default:
|
||||
logger.Errorf("unknown action: %s of task[%d]", action.Action, id)
|
||||
}
|
||||
}
|
||||
|
||||
func cleanDoneTask(ctx *ctx.Context, id int64) {
|
||||
ingCount, err := models.IngStatusHostCount(ctx, id)
|
||||
if err != nil {
|
||||
logger.Errorf("cannot get task[%d] ing status host count: %v", id, err)
|
||||
return
|
||||
}
|
||||
|
||||
if ingCount > 0 {
|
||||
return
|
||||
}
|
||||
|
||||
err = models.CleanDoneTask(ctx, id)
|
||||
if err != nil {
|
||||
logger.Errorf("cannot clean done task[%d]: %v", id, err)
|
||||
}
|
||||
|
||||
logger.Debugf("task[%d] done", id)
|
||||
}
|
||||
|
||||
func startTask(ctx *ctx.Context, id int64, action *models.TaskAction) {
|
||||
meta, err := models.TaskMetaGetByID(ctx, id)
|
||||
if err != nil {
|
||||
logger.Errorf("cannot get task[%d] meta: %v", id, err)
|
||||
return
|
||||
}
|
||||
|
||||
if meta == nil {
|
||||
logger.Errorf("task[%d] meta lost", id)
|
||||
return
|
||||
}
|
||||
|
||||
count, err := models.UnexpectedHostCount(ctx, id)
|
||||
if err != nil {
|
||||
logger.Errorf("cannot get task[%d] unexpected host count: %v", id, err)
|
||||
return
|
||||
}
|
||||
|
||||
if count > int64(meta.Tolerance) {
|
||||
err = action.Update(ctx, "pause")
|
||||
if err != nil {
|
||||
logger.Errorf("cannot update task[%d] action to 'pause': %v", id, err)
|
||||
}
|
||||
|
||||
return
|
||||
}
|
||||
|
||||
waitings, err := models.WaitingHostList(ctx, id)
|
||||
if err != nil {
|
||||
logger.Errorf("cannot get task[%d] waiting host: %v", id, err)
|
||||
return
|
||||
}
|
||||
|
||||
waitingsCount := len(waitings)
|
||||
if waitingsCount == 0 {
|
||||
return
|
||||
}
|
||||
|
||||
doingsCount, err := models.TableRecordCount(ctx, models.TaskHostDoing{}.TableName(), "id=?", id)
|
||||
if err != nil {
|
||||
logger.Errorf("cannot get task[%d] doing host count: %v", id, err)
|
||||
return
|
||||
}
|
||||
|
||||
need := meta.Batch - int(doingsCount)
|
||||
if meta.Batch == 0 {
|
||||
need = waitingsCount
|
||||
}
|
||||
|
||||
if need <= 0 {
|
||||
return
|
||||
}
|
||||
|
||||
if need > waitingsCount {
|
||||
need = waitingsCount
|
||||
}
|
||||
|
||||
arr := str.ParseCommaTrim(meta.Pause)
|
||||
end := need
|
||||
for i := 0; i < need; i++ {
|
||||
if slice.ContainsString(arr, waitings[i].Host) {
|
||||
end = i + 1
|
||||
err = action.Update(ctx, "pause")
|
||||
if err != nil {
|
||||
logger.Errorf("cannot update task[%d] action to 'pause': %v", id, err)
|
||||
return
|
||||
}
|
||||
break
|
||||
}
|
||||
}
|
||||
|
||||
err = models.RunWaitingHosts(ctx, waitings[:end])
|
||||
if err != nil {
|
||||
logger.Errorf("cannot run waiting hosts: %v", err)
|
||||
}
|
||||
}
|
||||
@@ -1,45 +0,0 @@
|
||||
package logic
|
||||
|
||||
import (
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
"time"
|
||||
|
||||
"github.com/toolkits/pkg/logger"
|
||||
)
|
||||
|
||||
func CheckTimeout(ctx *ctx.Context, id int64) {
|
||||
meta, err := models.TaskMetaGetByID(ctx, id)
|
||||
if err != nil {
|
||||
logger.Errorf("cannot get task[%d] meta: %v", id, err)
|
||||
return
|
||||
}
|
||||
|
||||
if meta == nil {
|
||||
logger.Errorf("task[%d] meta lost", id)
|
||||
return
|
||||
}
|
||||
|
||||
hosts, err := models.TableRecordGets[[]models.TaskHostDoing](ctx, models.TaskHostDoing{}.TableName(), "id=?", id)
|
||||
if err != nil {
|
||||
logger.Errorf("cannot get task[%d] doing host list: %v", id, err)
|
||||
return
|
||||
}
|
||||
|
||||
count := len(hosts)
|
||||
if count == 0 {
|
||||
return
|
||||
}
|
||||
|
||||
// 3s: task dispatch duration: web -> db -> scheduler -> executor
|
||||
timeout := int64(meta.Timeout + 3)
|
||||
now := time.Now().Unix()
|
||||
for i := 0; i < count; i++ {
|
||||
if now-hosts[i].Clock > timeout {
|
||||
err = models.MarkDoneStatus(ctx, hosts[i].Id, hosts[i].Clock, hosts[i].Host, "timeout", "", "")
|
||||
if err != nil {
|
||||
logger.Errorf("cannot mark task[%d] done status: %v", id, err)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -1,40 +0,0 @@
|
||||
package router
|
||||
|
||||
import (
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
"net/http"
|
||||
"strings"
|
||||
|
||||
"github.com/toolkits/pkg/errorx"
|
||||
)
|
||||
|
||||
func TaskMeta(ctx *ctx.Context, id int64) *models.TaskMeta {
|
||||
obj, err := models.TaskMetaGet(ctx, "id = ?", id)
|
||||
errorx.Dangerous(err)
|
||||
|
||||
if obj == nil {
|
||||
errorx.Bomb(http.StatusNotFound, "no such task meta")
|
||||
}
|
||||
|
||||
return obj
|
||||
}
|
||||
|
||||
func cleanHosts(formHosts []string) []string {
|
||||
cnt := len(formHosts)
|
||||
arr := make([]string, 0, cnt)
|
||||
for i := 0; i < cnt; i++ {
|
||||
item := strings.TrimSpace(formHosts[i])
|
||||
if item == "" {
|
||||
continue
|
||||
}
|
||||
|
||||
if strings.HasPrefix(item, "#") {
|
||||
continue
|
||||
}
|
||||
|
||||
arr = append(arr, item)
|
||||
}
|
||||
|
||||
return arr
|
||||
}
|
||||
@@ -1,612 +0,0 @@
|
||||
package router
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
"strconv"
|
||||
|
||||
"io/ioutil"
|
||||
"net/http"
|
||||
"time"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/ibex/server/config"
|
||||
"github.com/ccfos/nightingale/v6/storage"
|
||||
|
||||
"github.com/gin-gonic/gin"
|
||||
"github.com/toolkits/pkg/errorx"
|
||||
"github.com/toolkits/pkg/ginx"
|
||||
"github.com/toolkits/pkg/logger"
|
||||
"github.com/toolkits/pkg/slice"
|
||||
"github.com/toolkits/pkg/str"
|
||||
)
|
||||
|
||||
func (rou *Router) taskStdout(c *gin.Context) {
|
||||
meta := TaskMeta(rou.ctx, UrlParamsInt64(c, "id"))
|
||||
stdouts, err := meta.Stdouts(rou.ctx)
|
||||
ginx.NewRender(c).Data(stdouts, err)
|
||||
}
|
||||
|
||||
func (rou *Router) taskStderr(c *gin.Context) {
|
||||
meta := TaskMeta(rou.ctx, UrlParamsInt64(c, "id"))
|
||||
stderrs, err := meta.Stderrs(rou.ctx)
|
||||
ginx.NewRender(c).Data(stderrs, err)
|
||||
}
|
||||
|
||||
// TODO: 不能只判断task_action,还应该看所有的host执行情况
|
||||
func (rou *Router) taskState(c *gin.Context) {
|
||||
action, err := models.TaskActionGet(rou.ctx, "id=?", UrlParamsInt64(c, "id"))
|
||||
if err != nil {
|
||||
ginx.NewRender(c).Data("", err)
|
||||
return
|
||||
}
|
||||
|
||||
state := "done"
|
||||
if action != nil {
|
||||
state = action.Action
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Data(state, err)
|
||||
}
|
||||
|
||||
func (rou *Router) taskResult(c *gin.Context) {
|
||||
id := UrlParamsInt64(c, "id")
|
||||
|
||||
hosts, err := models.TaskHostStatus(rou.ctx, id)
|
||||
if err != nil {
|
||||
errorx.Bomb(500, "load task hosts of %d occur error %v", id, err)
|
||||
}
|
||||
|
||||
ss := make(map[string][]string)
|
||||
total := len(hosts)
|
||||
for i := 0; i < total; i++ {
|
||||
s := hosts[i].Status
|
||||
ss[s] = append(ss[s], hosts[i].Host)
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Data(ss, nil)
|
||||
}
|
||||
|
||||
func (rou *Router) taskHostOutput(c *gin.Context) {
|
||||
obj, err := models.TaskHostGet(rou.ctx, UrlParamsInt64(c, "id"), ginx.UrlParamStr(c, "host"))
|
||||
ginx.NewRender(c).Data(obj, err)
|
||||
}
|
||||
|
||||
func (rou *Router) taskHostStdout(c *gin.Context) {
|
||||
id := UrlParamsInt64(c, "id")
|
||||
host := ginx.UrlParamStr(c, "host")
|
||||
|
||||
if config.C.Output.ComeFrom == "database" || config.C.Output.ComeFrom == "" {
|
||||
obj, err := models.TaskHostGet(rou.ctx, id, host)
|
||||
ginx.NewRender(c).Data(obj.Stdout, err)
|
||||
return
|
||||
}
|
||||
|
||||
if config.C.Output.AgtdPort <= 0 || config.C.Output.AgtdPort > 65535 {
|
||||
ginx.NewRender(c).Message(fmt.Errorf("remotePort(%d) invalid", config.C.Output.AgtdPort))
|
||||
return
|
||||
}
|
||||
|
||||
url := fmt.Sprintf("http://%s:%d/output/%d/stdout.json", host, config.C.Output.AgtdPort, id)
|
||||
client := &http.Client{
|
||||
CheckRedirect: func(req *http.Request, via []*http.Request) error {
|
||||
return http.ErrUseLastResponse
|
||||
},
|
||||
}
|
||||
resp, err := client.Get(url)
|
||||
errorx.Dangerous(err)
|
||||
|
||||
defer resp.Body.Close()
|
||||
|
||||
bs, err := ioutil.ReadAll(resp.Body)
|
||||
errorx.Dangerous(err)
|
||||
|
||||
c.Writer.Header().Set("Content-Type", "application/json; charset=UTF-8")
|
||||
c.Writer.Write(bs)
|
||||
}
|
||||
|
||||
func (rou *Router) taskHostStderr(c *gin.Context) {
|
||||
id := UrlParamsInt64(c, "id")
|
||||
host := ginx.UrlParamStr(c, "host")
|
||||
|
||||
if config.C.Output.ComeFrom == "database" || config.C.Output.ComeFrom == "" {
|
||||
obj, err := models.TaskHostGet(rou.ctx, id, host)
|
||||
ginx.NewRender(c).Data(obj.Stderr, err)
|
||||
return
|
||||
}
|
||||
|
||||
if config.C.Output.AgtdPort <= 0 || config.C.Output.AgtdPort > 65535 {
|
||||
ginx.NewRender(c).Message(fmt.Errorf("remotePort(%d) invalid", config.C.Output.AgtdPort))
|
||||
return
|
||||
}
|
||||
|
||||
url := fmt.Sprintf("http://%s:%d/output/%d/stderr.json", host, config.C.Output.AgtdPort, id)
|
||||
client := &http.Client{
|
||||
CheckRedirect: func(req *http.Request, via []*http.Request) error {
|
||||
return http.ErrUseLastResponse
|
||||
},
|
||||
}
|
||||
resp, err := client.Get(url)
|
||||
errorx.Dangerous(err)
|
||||
|
||||
defer resp.Body.Close()
|
||||
|
||||
bs, err := ioutil.ReadAll(resp.Body)
|
||||
errorx.Dangerous(err)
|
||||
|
||||
c.Writer.Header().Set("Content-Type", "application/json; charset=UTF-8")
|
||||
c.Writer.Write(bs)
|
||||
}
|
||||
|
||||
func (rou *Router) taskStdoutTxt(c *gin.Context) {
|
||||
id := UrlParamsInt64(c, "id")
|
||||
|
||||
meta, err := models.TaskMetaGet(rou.ctx, "id = ?", id)
|
||||
if err != nil {
|
||||
c.String(500, err.Error())
|
||||
return
|
||||
}
|
||||
|
||||
if meta == nil {
|
||||
c.String(404, "no such task")
|
||||
return
|
||||
}
|
||||
|
||||
stdouts, err := meta.Stdouts(rou.ctx)
|
||||
if err != nil {
|
||||
c.String(500, err.Error())
|
||||
return
|
||||
}
|
||||
|
||||
w := c.Writer
|
||||
|
||||
w.Header().Set("Content-Type", "text/plain; charset=utf-8")
|
||||
count := len(stdouts)
|
||||
for i := 0; i < count; i++ {
|
||||
if i != 0 {
|
||||
w.Write([]byte("\n\n"))
|
||||
}
|
||||
|
||||
w.Write([]byte(stdouts[i].Host + ":\n"))
|
||||
w.Write([]byte(stdouts[i].Stdout))
|
||||
}
|
||||
}
|
||||
|
||||
func (rou *Router) taskStderrTxt(c *gin.Context) {
|
||||
id := UrlParamsInt64(c, "id")
|
||||
|
||||
meta, err := models.TaskMetaGet(rou.ctx, "id = ?", id)
|
||||
if err != nil {
|
||||
c.String(500, err.Error())
|
||||
return
|
||||
}
|
||||
|
||||
if meta == nil {
|
||||
c.String(404, "no such task")
|
||||
return
|
||||
}
|
||||
|
||||
stderrs, err := meta.Stderrs(rou.ctx)
|
||||
if err != nil {
|
||||
c.String(500, err.Error())
|
||||
return
|
||||
}
|
||||
|
||||
w := c.Writer
|
||||
|
||||
w.Header().Set("Content-Type", "text/plain; charset=utf-8")
|
||||
count := len(stderrs)
|
||||
for i := 0; i < count; i++ {
|
||||
if i != 0 {
|
||||
w.Write([]byte("\n\n"))
|
||||
}
|
||||
|
||||
w.Write([]byte(stderrs[i].Host + ":\n"))
|
||||
w.Write([]byte(stderrs[i].Stderr))
|
||||
}
|
||||
}
|
||||
|
||||
type TaskStdoutData struct {
|
||||
Host string `json:"host"`
|
||||
Stdout string `json:"stdout"`
|
||||
}
|
||||
|
||||
type TaskStderrData struct {
|
||||
Host string `json:"host"`
|
||||
Stderr string `json:"stderr"`
|
||||
}
|
||||
|
||||
func (rou *Router) taskStdoutJSON(c *gin.Context) {
|
||||
task := TaskMeta(rou.ctx, UrlParamsInt64(c, "id"))
|
||||
|
||||
host := ginx.QueryStr(c, "host", "")
|
||||
|
||||
var hostsLen int
|
||||
var ret []TaskStdoutData
|
||||
|
||||
if host != "" {
|
||||
obj, err := models.TaskHostGet(rou.ctx, task.Id, host)
|
||||
if err != nil {
|
||||
ginx.NewRender(c).Data("", err)
|
||||
return
|
||||
} else if obj == nil {
|
||||
ginx.NewRender(c).Data("", fmt.Errorf("task: %d, host(%s) not eixsts", task.Id, host))
|
||||
return
|
||||
} else {
|
||||
ret = append(ret, TaskStdoutData{
|
||||
Host: host,
|
||||
Stdout: obj.Stdout,
|
||||
})
|
||||
}
|
||||
} else {
|
||||
hosts, err := models.TaskHostGets(rou.ctx, task.Id)
|
||||
if err != nil {
|
||||
ginx.NewRender(c).Data("", err)
|
||||
return
|
||||
}
|
||||
|
||||
hostsLen = len(hosts)
|
||||
|
||||
ret = make([]TaskStdoutData, 0, hostsLen)
|
||||
for i := 0; i < hostsLen; i++ {
|
||||
ret = append(ret, TaskStdoutData{
|
||||
Host: hosts[i].Host,
|
||||
Stdout: hosts[i].Stdout,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Data(ret, nil)
|
||||
}
|
||||
|
||||
func (rou *Router) taskStderrJSON(c *gin.Context) {
|
||||
task := TaskMeta(rou.ctx, UrlParamsInt64(c, "id"))
|
||||
|
||||
host := ginx.QueryStr(c, "host", "")
|
||||
|
||||
var hostsLen int
|
||||
var ret []TaskStderrData
|
||||
|
||||
if host != "" {
|
||||
obj, err := models.TaskHostGet(rou.ctx, task.Id, host)
|
||||
if err != nil {
|
||||
ginx.NewRender(c).Data("", err)
|
||||
return
|
||||
} else if obj == nil {
|
||||
ginx.NewRender(c).Data("", fmt.Errorf("task: %d, host(%s) not eixsts", task.Id, host))
|
||||
return
|
||||
} else {
|
||||
ret = append(ret, TaskStderrData{
|
||||
Host: host,
|
||||
Stderr: obj.Stderr,
|
||||
})
|
||||
}
|
||||
} else {
|
||||
hosts, err := models.TaskHostGets(rou.ctx, task.Id)
|
||||
if err != nil {
|
||||
ginx.NewRender(c).Data("", err)
|
||||
return
|
||||
}
|
||||
|
||||
hostsLen = len(hosts)
|
||||
|
||||
ret = make([]TaskStderrData, 0, hostsLen)
|
||||
for i := 0; i < hostsLen; i++ {
|
||||
ret = append(ret, TaskStderrData{
|
||||
Host: hosts[i].Host,
|
||||
Stderr: hosts[i].Stderr,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Data(ret, nil)
|
||||
}
|
||||
|
||||
type taskForm struct {
|
||||
Title string `json:"title" binding:"required"`
|
||||
Account string `json:"account" binding:"required"`
|
||||
Batch int `json:"batch"`
|
||||
Tolerance int `json:"tolerance"`
|
||||
Timeout int `json:"timeout"`
|
||||
Pause string `json:"pause"`
|
||||
Script string `json:"script" binding:"required"`
|
||||
Args string `json:"args"`
|
||||
Stdin string `json:"stdin"`
|
||||
Action string `json:"action" binding:"required"`
|
||||
Creator string `json:"creator" binding:"required"`
|
||||
Hosts []string `json:"hosts" binding:"required"`
|
||||
AlertTriggered bool `json:"alert_triggered"`
|
||||
}
|
||||
|
||||
func (rou *Router) taskAdd(c *gin.Context) {
|
||||
var f taskForm
|
||||
ginx.BindJSON(c, &f)
|
||||
|
||||
hosts := cleanHosts(f.Hosts)
|
||||
if len(hosts) == 0 {
|
||||
errorx.Bomb(http.StatusBadRequest, "arg(hosts) empty")
|
||||
}
|
||||
|
||||
taskMeta := &models.TaskMeta{
|
||||
Title: f.Title,
|
||||
Account: f.Account,
|
||||
Batch: f.Batch,
|
||||
Tolerance: f.Tolerance,
|
||||
Timeout: f.Timeout,
|
||||
Pause: f.Pause,
|
||||
Script: f.Script,
|
||||
Args: f.Args,
|
||||
Stdin: f.Stdin,
|
||||
Creator: f.Creator,
|
||||
}
|
||||
|
||||
err := taskMeta.CleanFields()
|
||||
ginx.Dangerous(err)
|
||||
taskMeta.HandleFH(hosts[0])
|
||||
|
||||
authUser := c.MustGet(gin.AuthUserKey).(string)
|
||||
// 任务类型分为"告警规则触发"和"n9e center用户下发"两种;
|
||||
// 边缘机房"告警规则触发"的任务不需要规划,并且它可能是失联的,无法使用db资源,所以放入redis缓存中,直接下发给agentd执行
|
||||
if !config.C.IsCenter && f.AlertTriggered {
|
||||
if err := taskMeta.Create(rou.ctx); err != nil {
|
||||
// 当网络不连通时,生成唯一的id,防止边缘机房中不同任务的id相同;
|
||||
// 方法是,redis自增id去防止同一个机房的不同n9e edge生成的id相同;
|
||||
// 但没法防止不同边缘机房生成同样的id,所以,生成id的数据不会上报存入数据库,只用于闭环执行。
|
||||
taskMeta.Id, err = storage.IdGet(rou.ctx.Redis)
|
||||
ginx.Dangerous(err)
|
||||
}
|
||||
if err == nil {
|
||||
taskHost := models.TaskHost{
|
||||
Id: taskMeta.Id,
|
||||
Host: hosts[0],
|
||||
Status: "running",
|
||||
}
|
||||
if err = taskHost.Create(rou.ctx); err != nil {
|
||||
logger.Warningf("task_add_fail: authUser=%s title=%s err=%s", authUser, taskMeta.Title, err.Error())
|
||||
}
|
||||
}
|
||||
|
||||
// 缓存任务元信息和待下发的任务
|
||||
err = taskMeta.Cache(rou.ctx, hosts[0])
|
||||
ginx.Dangerous(err)
|
||||
|
||||
} else {
|
||||
// 如果是中心机房,还是保持之前的逻辑
|
||||
err = taskMeta.Save(rou.ctx, hosts, f.Action)
|
||||
ginx.Dangerous(err)
|
||||
}
|
||||
|
||||
logger.Infof("task_add_succ: authUser=%s title=%s", authUser, taskMeta.Title)
|
||||
|
||||
ginx.NewRender(c).Data(taskMeta.Id, err)
|
||||
}
|
||||
|
||||
func (rou *Router) taskGet(c *gin.Context) {
|
||||
meta := TaskMeta(rou.ctx, UrlParamsInt64(c, "id"))
|
||||
|
||||
hosts, err := meta.Hosts(rou.ctx)
|
||||
errorx.Dangerous(err)
|
||||
|
||||
action, err := meta.Action(rou.ctx)
|
||||
errorx.Dangerous(err)
|
||||
|
||||
actionStr := ""
|
||||
if action != nil {
|
||||
actionStr = action.Action
|
||||
} else {
|
||||
meta.Done = true
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Data(gin.H{
|
||||
"meta": meta,
|
||||
"hosts": hosts,
|
||||
"action": actionStr,
|
||||
}, nil)
|
||||
}
|
||||
|
||||
// 传进来一堆ids,返回已经done的任务的ids
|
||||
func (rou *Router) doneIds(c *gin.Context) {
|
||||
ids := ginx.QueryStr(c, "ids", "")
|
||||
if ids == "" {
|
||||
errorx.Dangerous("arg(ids) empty")
|
||||
}
|
||||
|
||||
idsint64 := str.IdsInt64(ids, ",")
|
||||
if len(idsint64) == 0 {
|
||||
errorx.Dangerous("arg(ids) empty")
|
||||
}
|
||||
|
||||
exists, err := models.TaskActionExistsIds(rou.ctx, idsint64)
|
||||
errorx.Dangerous(err)
|
||||
|
||||
dones := slice.SubInt64(idsint64, exists)
|
||||
ginx.NewRender(c).Data(gin.H{
|
||||
"list": dones,
|
||||
}, nil)
|
||||
}
|
||||
|
||||
func (rou *Router) taskGets(c *gin.Context) {
|
||||
query := ginx.QueryStr(c, "query", "")
|
||||
limit := ginx.QueryInt(c, "limit", 20)
|
||||
creator := ginx.QueryStr(c, "creator", "")
|
||||
days := ginx.QueryInt64(c, "days", 7)
|
||||
|
||||
before := time.Unix(time.Now().Unix()-days*24*3600, 0)
|
||||
|
||||
total, err := models.TaskMetaTotal(rou.ctx, creator, query, before)
|
||||
errorx.Dangerous(err)
|
||||
|
||||
list, err := models.TaskMetaGets(rou.ctx, creator, query, before, limit, ginx.Offset(c, limit))
|
||||
errorx.Dangerous(err)
|
||||
|
||||
cnt := len(list)
|
||||
ids := make([]int64, cnt)
|
||||
for i := 0; i < cnt; i++ {
|
||||
ids[i] = list[i].Id
|
||||
}
|
||||
|
||||
exists, err := models.TaskActionExistsIds(rou.ctx, ids)
|
||||
errorx.Dangerous(err)
|
||||
|
||||
for i := 0; i < cnt; i++ {
|
||||
if slice.ContainsInt64(exists, list[i].Id) {
|
||||
list[i].Done = false
|
||||
} else {
|
||||
list[i].Done = true
|
||||
}
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Data(gin.H{
|
||||
"total": total,
|
||||
"list": list,
|
||||
}, nil)
|
||||
}
|
||||
|
||||
type actionForm struct {
|
||||
Action string `json:"action"`
|
||||
}
|
||||
|
||||
func (rou *Router) taskAction(c *gin.Context) {
|
||||
meta := TaskMeta(rou.ctx, UrlParamsInt64(c, "id"))
|
||||
|
||||
var f actionForm
|
||||
ginx.BindJSON(c, &f)
|
||||
|
||||
action, err := models.TaskActionGet(rou.ctx, "id=?", meta.Id)
|
||||
errorx.Dangerous(err)
|
||||
|
||||
if action == nil {
|
||||
errorx.Bomb(200, "task already finished, no more action can do")
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Message(action.Update(rou.ctx, f.Action))
|
||||
}
|
||||
|
||||
func (rou *Router) taskHostAction(c *gin.Context) {
|
||||
host := ginx.UrlParamStr(c, "host")
|
||||
meta := TaskMeta(rou.ctx, UrlParamsInt64(c, "id"))
|
||||
|
||||
noopWhenDone(rou.ctx, meta.Id)
|
||||
|
||||
var f actionForm
|
||||
ginx.BindJSON(c, &f)
|
||||
|
||||
if f.Action == "ignore" {
|
||||
errorx.Dangerous(meta.IgnoreHost(rou.ctx, host))
|
||||
|
||||
action, err := models.TaskActionGet(rou.ctx, "id=?", meta.Id)
|
||||
errorx.Dangerous(err)
|
||||
|
||||
if action != nil && action.Action == "pause" {
|
||||
ginx.NewRender(c).Data("you can click start to run the task", nil)
|
||||
return
|
||||
}
|
||||
}
|
||||
|
||||
if f.Action == "kill" {
|
||||
errorx.Dangerous(meta.KillHost(rou.ctx, host))
|
||||
}
|
||||
|
||||
if f.Action == "redo" {
|
||||
errorx.Dangerous(meta.RedoHost(rou.ctx, host))
|
||||
}
|
||||
|
||||
ginx.NewRender(c).Message(nil)
|
||||
}
|
||||
|
||||
func noopWhenDone(ctx *ctx.Context, id int64) {
|
||||
action, err := models.TaskActionGet(ctx, "id=?", id)
|
||||
errorx.Dangerous(err)
|
||||
|
||||
if action == nil {
|
||||
errorx.Bomb(200, "task already finished, no more taskAction can do")
|
||||
}
|
||||
}
|
||||
|
||||
type sqlCondForm struct {
|
||||
Table string
|
||||
Where string
|
||||
Args []interface{}
|
||||
}
|
||||
|
||||
func (rou *Router) tableRecordListGet(c *gin.Context) {
|
||||
var f sqlCondForm
|
||||
ginx.BindJSON(c, &f)
|
||||
switch f.Table {
|
||||
case models.TaskHostDoing{}.TableName():
|
||||
lst, err := models.TableRecordGets[[]models.TaskHostDoing](rou.ctx, f.Table, f.Where, f.Args)
|
||||
ginx.NewRender(c).Data(lst, err)
|
||||
case models.TaskMeta{}.TableName():
|
||||
lst, err := models.TableRecordGets[[]models.TaskMeta](rou.ctx, f.Table, f.Where, f.Args)
|
||||
ginx.NewRender(c).Data(lst, err)
|
||||
default:
|
||||
ginx.Bomb(http.StatusBadRequest, "table[%v] not support", f.Table)
|
||||
}
|
||||
}
|
||||
|
||||
func (rou *Router) tableRecordCount(c *gin.Context) {
|
||||
var f sqlCondForm
|
||||
ginx.BindJSON(c, &f)
|
||||
ginx.NewRender(c).Data(models.TableRecordCount(rou.ctx, f.Table, f.Where, f.Args))
|
||||
}
|
||||
|
||||
type markDoneForm struct {
|
||||
Id int64
|
||||
Clock int64
|
||||
Host string
|
||||
Status string
|
||||
Stdout string
|
||||
Stderr string
|
||||
}
|
||||
|
||||
func (rou *Router) markDone(c *gin.Context) {
|
||||
var f markDoneForm
|
||||
ginx.BindJSON(c, &f)
|
||||
ginx.NewRender(c).Message(models.MarkDoneStatus(rou.ctx, f.Id, f.Clock, f.Host, f.Status, f.Stdout, f.Stderr))
|
||||
}
|
||||
|
||||
func (rou *Router) taskMetaAdd(c *gin.Context) {
|
||||
var f models.TaskMeta
|
||||
ginx.BindJSON(c, &f)
|
||||
err := f.Create(rou.ctx)
|
||||
ginx.NewRender(c).Data(f.Id, err)
|
||||
}
|
||||
|
||||
func (rou *Router) taskHostAdd(c *gin.Context) {
|
||||
var f models.TaskHost
|
||||
ginx.BindJSON(c, &f)
|
||||
ginx.NewRender(c).Message(f.Upsert(rou.ctx))
|
||||
}
|
||||
|
||||
func (rou *Router) taskHostUpsert(c *gin.Context) {
|
||||
var f []models.TaskHost
|
||||
ginx.BindJSON(c, &f)
|
||||
ginx.NewRender(c).Data(models.TaskHostUpserts(rou.ctx, f))
|
||||
}
|
||||
|
||||
func UrlParamsInt64(c *gin.Context, field string) int64 {
|
||||
|
||||
var params []gin.Param
|
||||
for _, p := range c.Params {
|
||||
if p.Key == "id" {
|
||||
params = append(params, p)
|
||||
}
|
||||
}
|
||||
|
||||
var strval string
|
||||
if len(params) == 1 {
|
||||
strval = ginx.UrlParamStr(c, field)
|
||||
} else if len(params) == 2 {
|
||||
strval = params[1].Value
|
||||
} else {
|
||||
logger.Warningf("url param[%+v] not ok", params)
|
||||
errorx.Bomb(http.StatusBadRequest, "url param[%s] is blank", field)
|
||||
}
|
||||
|
||||
intval, err := strconv.ParseInt(strval, 10, 64)
|
||||
if err != nil {
|
||||
errorx.Bomb(http.StatusBadRequest, "cannot convert %s to int64", strval)
|
||||
}
|
||||
|
||||
return intval
|
||||
}
|
||||
@@ -1,132 +0,0 @@
|
||||
package router
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
|
||||
"os"
|
||||
"strings"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/ibex/server/config"
|
||||
"github.com/ccfos/nightingale/v6/pkg/aop"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/center/router"
|
||||
"github.com/gin-contrib/pprof"
|
||||
"github.com/gin-gonic/gin"
|
||||
)
|
||||
|
||||
func New(ctx *ctx.Context, version string) *gin.Engine {
|
||||
gin.SetMode(config.C.RunMode)
|
||||
|
||||
loggerMid := aop.Logger()
|
||||
recoveryMid := aop.Recovery()
|
||||
|
||||
if strings.ToLower(config.C.RunMode) == "release" {
|
||||
aop.DisableConsoleColor()
|
||||
}
|
||||
|
||||
r := gin.New()
|
||||
|
||||
r.Use(recoveryMid)
|
||||
|
||||
// whether print access log
|
||||
if config.C.HTTP.PrintAccessLog {
|
||||
r.Use(loggerMid)
|
||||
}
|
||||
|
||||
rou := NewRouter(ctx)
|
||||
|
||||
rou.configBaseRouter(r, version)
|
||||
rou.ConfigRouter(r)
|
||||
|
||||
return r
|
||||
}
|
||||
|
||||
type Router struct {
|
||||
ctx *ctx.Context
|
||||
}
|
||||
|
||||
func NewRouter(ctx *ctx.Context) *Router {
|
||||
return &Router{
|
||||
ctx: ctx,
|
||||
}
|
||||
}
|
||||
|
||||
func (rou *Router) configBaseRouter(r *gin.Engine, version string) {
|
||||
if config.C.HTTP.PProf {
|
||||
pprof.Register(r, "/debug/pprof")
|
||||
}
|
||||
|
||||
r.GET("/ping", func(c *gin.Context) {
|
||||
c.String(200, "pong")
|
||||
})
|
||||
|
||||
r.GET("/pid", func(c *gin.Context) {
|
||||
c.String(200, fmt.Sprintf("%d", os.Getpid()))
|
||||
})
|
||||
|
||||
r.GET("/addr", func(c *gin.Context) {
|
||||
c.String(200, c.Request.RemoteAddr)
|
||||
})
|
||||
|
||||
r.GET("/version", func(c *gin.Context) {
|
||||
c.String(200, version)
|
||||
})
|
||||
}
|
||||
|
||||
func (rou *Router) ConfigRouter(r *gin.Engine, rts ...*router.Router) {
|
||||
|
||||
if len(rts) > 0 {
|
||||
rt := rts[0]
|
||||
pagesPrefix := "/api/n9e/busi-group/:id"
|
||||
pages := r.Group(pagesPrefix)
|
||||
{
|
||||
pages.GET("/task/:id", rt.Auth(), rt.User(), rt.Perm("/job-tasks"), rou.taskGet)
|
||||
pages.PUT("/task/:id/action", rt.Auth(), rt.User(), rt.Perm("/job-tasks/put"), rt.Bgrw(), rou.taskAction)
|
||||
pages.GET("/task/:id/stdout", rt.Auth(), rt.User(), rt.Perm("/job-tasks"), rou.taskStdout)
|
||||
pages.GET("/task/:id/stderr", rt.Auth(), rt.User(), rt.Perm("/job-tasks"), rou.taskStderr)
|
||||
pages.GET("/task/:id/state", rt.Auth(), rt.User(), rt.Perm("/job-tasks"), rou.taskState)
|
||||
pages.GET("/task/:id/result", rt.Auth(), rt.User(), rt.Perm("/job-tasks"), rou.taskResult)
|
||||
pages.PUT("/task/:id/host/:host/action", rt.Auth(), rt.User(), rt.Perm("/job-tasks/put"), rt.Bgrw(), rou.taskHostAction)
|
||||
pages.GET("/task/:id/host/:host/output", rt.Auth(), rt.User(), rt.Perm("/job-tasks"), rou.taskHostOutput)
|
||||
pages.GET("/task/:id/host/:host/stdout", rt.Auth(), rt.User(), rt.Perm("/job-tasks"), rou.taskHostStdout)
|
||||
pages.GET("/task/:id/host/:host/stderr", rt.Auth(), rt.User(), rt.Perm("/job-tasks"), rou.taskHostStderr)
|
||||
pages.GET("/task/:id/stdout.txt", rt.Auth(), rt.User(), rt.Perm("/job-tasks"), rou.taskStdoutTxt)
|
||||
pages.GET("/task/:id/stderr.txt", rt.Auth(), rt.User(), rt.Perm("/job-tasks"), rou.taskStderrTxt)
|
||||
pages.GET("/task/:id/stdout.json", rt.Auth(), rt.User(), rt.Perm("/job-tasks"), rou.taskStdoutJSON)
|
||||
pages.GET("/task/:id/stderr.json", rt.Auth(), rt.User(), rt.Perm("/job-tasks"), rou.taskStderrJSON)
|
||||
}
|
||||
}
|
||||
|
||||
api := r.Group("/ibex/v1")
|
||||
if len(config.C.BasicAuth) > 0 {
|
||||
api = r.Group("/ibex/v1", gin.BasicAuth(config.C.BasicAuth))
|
||||
}
|
||||
{
|
||||
api.POST("/tasks", rou.taskAdd)
|
||||
api.GET("/tasks", rou.taskGets)
|
||||
api.GET("/tasks/done-ids", rou.doneIds)
|
||||
api.GET("/task/:id", rou.taskGet)
|
||||
api.PUT("/task/:id/action", rou.taskAction)
|
||||
api.GET("/task/:id/stdout", rou.taskStdout)
|
||||
api.GET("/task/:id/stderr", rou.taskStderr)
|
||||
api.GET("/task/:id/state", rou.taskState)
|
||||
api.GET("/task/:id/result", rou.taskResult)
|
||||
api.PUT("/task/:id/host/:host/action", rou.taskHostAction)
|
||||
api.GET("/task/:id/host/:host/output", rou.taskHostOutput)
|
||||
api.GET("/task/:id/host/:host/stdout", rou.taskHostStdout)
|
||||
api.GET("/task/:id/host/:host/stderr", rou.taskHostStderr)
|
||||
api.GET("/task/:id/stdout.txt", rou.taskStdoutTxt)
|
||||
api.GET("/task/:id/stderr.txt", rou.taskStderrTxt)
|
||||
api.GET("/task/:id/stdout.json", rou.taskStdoutJSON)
|
||||
api.GET("/task/:id/stderr.json", rou.taskStderrJSON)
|
||||
|
||||
// api for edge server
|
||||
api.POST("/table/record/list", rou.tableRecordListGet)
|
||||
api.POST("/table/record/count", rou.tableRecordCount)
|
||||
api.POST("/mark/done", rou.markDone)
|
||||
api.POST("/task/meta", rou.taskMetaAdd)
|
||||
api.POST("/task/host/", rou.taskHostAdd)
|
||||
api.POST("/task/hosts/upsert", rou.taskHostUpsert)
|
||||
}
|
||||
}
|
||||
@@ -1,93 +0,0 @@
|
||||
package rpc
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"os"
|
||||
|
||||
"github.com/toolkits/pkg/logger"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/ibex/types"
|
||||
)
|
||||
|
||||
// Ping return string 'pong', just for test
|
||||
func (*Server) Ping(input string, output *string) error {
|
||||
*output = "pong"
|
||||
return nil
|
||||
}
|
||||
|
||||
func (*Server) GetTaskMeta(id int64, resp *types.TaskMetaResponse) error {
|
||||
meta, err := models.TaskMetaGetByID(ctxC, id)
|
||||
if err != nil {
|
||||
resp.Message = err.Error()
|
||||
return nil
|
||||
}
|
||||
|
||||
if meta == nil {
|
||||
resp.Message = fmt.Sprintf("task %d not found", id)
|
||||
return nil
|
||||
}
|
||||
|
||||
resp.Script = meta.Script
|
||||
resp.Args = meta.Args
|
||||
resp.Account = meta.Account
|
||||
resp.Stdin = meta.Stdin
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
func (*Server) Report(req types.ReportRequest, resp *types.ReportResponse) error {
|
||||
if req.ReportTasks != nil && len(req.ReportTasks) > 0 {
|
||||
err := handleDoneTask(req)
|
||||
if err != nil {
|
||||
resp.Message = err.Error()
|
||||
return nil
|
||||
}
|
||||
}
|
||||
|
||||
doings := models.GetDoingCache(req.Ident)
|
||||
|
||||
tasks := make([]types.AssignTask, 0, len(doings))
|
||||
for _, doing := range doings {
|
||||
tasks = append(tasks, types.AssignTask{
|
||||
Id: doing.Id,
|
||||
Clock: doing.Clock,
|
||||
Action: doing.Action,
|
||||
})
|
||||
}
|
||||
resp.AssignTasks = tasks
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
func handleDoneTask(req types.ReportRequest) error {
|
||||
count := len(req.ReportTasks)
|
||||
val, ok := os.LookupEnv("CONTINUOUS_OUTPUT")
|
||||
for i := 0; i < count; i++ {
|
||||
t := req.ReportTasks[i]
|
||||
if ok && val == "1" && t.Status == "running" {
|
||||
err := models.RealTimeUpdateOutput(ctxC, t.Id, req.Ident, t.Stdout, t.Stderr)
|
||||
if err != nil {
|
||||
logger.Errorf("cannot update output, id:%d, hostname:%s, clock:%d, status:%s, err: %v", t.Id, req.Ident, t.Clock, t.Status, err)
|
||||
return err
|
||||
}
|
||||
} else {
|
||||
if t.Status == "success" || t.Status == "failed" {
|
||||
exist, isEdgeAlertTriggered := models.CheckExistAndEdgeAlertTriggered(req.Ident, t.Id)
|
||||
// ibex agent可能会重复上报结果,如果任务已经不在task_host_doing缓存中了,说明该任务已经MarkDone了,不需要再处理
|
||||
if !exist {
|
||||
continue
|
||||
}
|
||||
|
||||
err := models.MarkDoneStatus(ctxC, t.Id, t.Clock, req.Ident, t.Status, t.Stdout, t.Stderr, isEdgeAlertTriggered)
|
||||
if err != nil {
|
||||
logger.Errorf("cannot mark task done, id:%d, hostname:%s, clock:%d, status:%s, err: %v", t.Id, req.Ident, t.Clock, t.Status, err)
|
||||
return err
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
@@ -1,61 +0,0 @@
|
||||
package rpc
|
||||
|
||||
import (
|
||||
"bufio"
|
||||
"fmt"
|
||||
"io"
|
||||
"net"
|
||||
"net/rpc"
|
||||
"os"
|
||||
"reflect"
|
||||
"time"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
|
||||
"github.com/toolkits/pkg/logger"
|
||||
"github.com/ugorji/go/codec"
|
||||
)
|
||||
|
||||
type Server int
|
||||
|
||||
var ctxC *ctx.Context
|
||||
|
||||
func Start(listen string, ctx *ctx.Context) {
|
||||
ctxC = ctx
|
||||
go serve(listen)
|
||||
}
|
||||
|
||||
func serve(listen string) {
|
||||
server := rpc.NewServer()
|
||||
server.Register(new(Server))
|
||||
|
||||
l, err := net.Listen("tcp", listen)
|
||||
if err != nil {
|
||||
fmt.Printf("fail to listen on: %s, error: %v\n", listen, err)
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
fmt.Println("rpc.listening:", listen)
|
||||
|
||||
var mh codec.MsgpackHandle
|
||||
mh.MapType = reflect.TypeOf(map[string]interface{}(nil))
|
||||
|
||||
duration := time.Duration(100) * time.Millisecond
|
||||
|
||||
for {
|
||||
conn, err := l.Accept()
|
||||
if err != nil {
|
||||
logger.Warningf("listener accept error: %v", err)
|
||||
time.Sleep(duration)
|
||||
continue
|
||||
}
|
||||
|
||||
var bufconn = struct {
|
||||
io.Closer
|
||||
*bufio.Reader
|
||||
*bufio.Writer
|
||||
}{conn, bufio.NewReader(conn), bufio.NewWriter(conn)}
|
||||
|
||||
go server.ServeCodec(codec.MsgpackSpecRpc.ServerCodec(bufconn, &mh))
|
||||
}
|
||||
}
|
||||
@@ -1,159 +0,0 @@
|
||||
package server
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
"os"
|
||||
"os/signal"
|
||||
"path/filepath"
|
||||
"syscall"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/ibex/server/config"
|
||||
"github.com/ccfos/nightingale/v6/ibex/server/router"
|
||||
"github.com/ccfos/nightingale/v6/ibex/server/rpc"
|
||||
"github.com/ccfos/nightingale/v6/ibex/server/timer"
|
||||
"github.com/ccfos/nightingale/v6/pkg/httpx"
|
||||
"github.com/ccfos/nightingale/v6/pkg/logx"
|
||||
"github.com/ccfos/nightingale/v6/storage"
|
||||
|
||||
"github.com/toolkits/pkg/i18n"
|
||||
)
|
||||
|
||||
type Server struct {
|
||||
ConfigFile string
|
||||
Version string
|
||||
}
|
||||
|
||||
type ServerOption func(*Server)
|
||||
|
||||
func SetConfigFile(f string) ServerOption {
|
||||
return func(s *Server) {
|
||||
s.ConfigFile = f
|
||||
}
|
||||
}
|
||||
|
||||
func SetVersion(v string) ServerOption {
|
||||
return func(s *Server) {
|
||||
s.Version = v
|
||||
}
|
||||
}
|
||||
|
||||
// Run run server
|
||||
func Run(isCenter bool, opts ...ServerOption) {
|
||||
code := 1
|
||||
sc := make(chan os.Signal, 1)
|
||||
signal.Notify(sc, syscall.SIGHUP, syscall.SIGINT, syscall.SIGTERM, syscall.SIGQUIT)
|
||||
|
||||
server := Server{
|
||||
ConfigFile: filepath.Join("etc", "ibex", "server.toml"),
|
||||
Version: "not specified",
|
||||
}
|
||||
|
||||
for _, opt := range opts {
|
||||
opt(&server)
|
||||
}
|
||||
|
||||
// parse config file
|
||||
config.MustLoad(server.ConfigFile)
|
||||
config.C.IsCenter = isCenter
|
||||
|
||||
cleanFunc, err := server.initialize()
|
||||
if err != nil {
|
||||
fmt.Println("server init fail:", err)
|
||||
os.Exit(code)
|
||||
}
|
||||
|
||||
EXIT:
|
||||
for {
|
||||
sig := <-sc
|
||||
fmt.Println("received signal:", sig.String())
|
||||
switch sig {
|
||||
case syscall.SIGQUIT, syscall.SIGTERM, syscall.SIGINT:
|
||||
code = 0
|
||||
break EXIT
|
||||
case syscall.SIGHUP:
|
||||
// reload configuration?
|
||||
default:
|
||||
break EXIT
|
||||
}
|
||||
}
|
||||
|
||||
cleanFunc()
|
||||
fmt.Println("server exited")
|
||||
os.Exit(code)
|
||||
}
|
||||
|
||||
func (s Server) initialize() (func(), error) {
|
||||
fns := Functions{}
|
||||
bgCtx, cancel := context.WithCancel(context.Background())
|
||||
fns.Add(cancel)
|
||||
|
||||
// init i18n
|
||||
i18n.Init()
|
||||
|
||||
// init logger
|
||||
loggerClean, err := logx.Init(config.C.Log)
|
||||
if err != nil {
|
||||
return fns.Ret(), err
|
||||
} else {
|
||||
fns.Add(loggerClean)
|
||||
}
|
||||
|
||||
var ctxC *ctx.Context
|
||||
|
||||
var redis storage.Redis
|
||||
if redis, err = storage.NewRedis(config.C.Redis); err != nil {
|
||||
return fns.Ret(), err
|
||||
}
|
||||
|
||||
// init database
|
||||
if config.C.IsCenter {
|
||||
db, err := storage.New(config.C.DB)
|
||||
if err != nil {
|
||||
return fns.Ret(), err
|
||||
}
|
||||
ctxC = ctx.NewContext(context.Background(), db, redis, true, config.C.CenterApi)
|
||||
} else {
|
||||
ctxC = ctx.NewContext(context.Background(), nil, redis, false, config.C.CenterApi)
|
||||
}
|
||||
|
||||
if err := storage.IdInit(ctxC.Redis); err != nil {
|
||||
fmt.Println("cannot init id generator: ", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
timer.CacheHostDoing(ctxC)
|
||||
timer.ReportResult(ctxC)
|
||||
if config.C.IsCenter {
|
||||
go timer.Heartbeat(ctxC)
|
||||
go timer.Schedule(ctxC)
|
||||
go timer.CleanLong(ctxC)
|
||||
}
|
||||
// init http server
|
||||
r := router.New(ctxC, s.Version)
|
||||
httpClean := httpx.Init(config.C.HTTP, bgCtx, r)
|
||||
fns.Add(httpClean)
|
||||
|
||||
// start rpc server
|
||||
rpc.Start(config.C.RPC.Listen, ctxC)
|
||||
|
||||
// release all the resources
|
||||
return fns.Ret(), nil
|
||||
}
|
||||
|
||||
type Functions struct {
|
||||
List []func()
|
||||
}
|
||||
|
||||
func (fs *Functions) Add(f func()) {
|
||||
fs.List = append(fs.List, f)
|
||||
}
|
||||
|
||||
func (fs *Functions) Ret() func() {
|
||||
return func() {
|
||||
for i := 0; i < len(fs.List); i++ {
|
||||
fs.List[i]()
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -1,76 +0,0 @@
|
||||
package timer
|
||||
|
||||
import (
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
"time"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/ibex/server/config"
|
||||
|
||||
"github.com/toolkits/pkg/logger"
|
||||
)
|
||||
|
||||
func Heartbeat(ctx *ctx.Context) {
|
||||
if config.C.Heartbeat.Interval == 0 {
|
||||
config.C.Heartbeat.Interval = 1000
|
||||
}
|
||||
|
||||
for {
|
||||
heartbeat(ctx)
|
||||
time.Sleep(time.Duration(config.C.Heartbeat.Interval) * time.Millisecond)
|
||||
}
|
||||
}
|
||||
|
||||
func heartbeat(ctx *ctx.Context) {
|
||||
ident := config.C.Heartbeat.LocalAddr
|
||||
|
||||
err := models.TaskSchedulerHeartbeat(ctx, ident)
|
||||
if err != nil {
|
||||
logger.Errorf("task scheduler(%s) cannot heartbeat: %v", ident, err)
|
||||
return
|
||||
}
|
||||
|
||||
dss, err := models.DeadTaskSchedulers(ctx)
|
||||
if err != nil {
|
||||
logger.Errorf("cannot get dead task schedulers: %v", err)
|
||||
return
|
||||
}
|
||||
|
||||
cnt := len(dss)
|
||||
if cnt == 0 {
|
||||
return
|
||||
}
|
||||
|
||||
for i := 0; i < cnt; i++ {
|
||||
ids, err := models.TasksOfScheduler(ctx, dss[i])
|
||||
if err != nil {
|
||||
logger.Errorf("cannot get tasks of scheduler(%s): %v", dss[i], err)
|
||||
return
|
||||
}
|
||||
|
||||
if len(ids) == 0 {
|
||||
err = models.DelDeadTaskScheduler(ctx, dss[i])
|
||||
if err != nil {
|
||||
logger.Errorf("cannot del dead task scheduler(%s): %v", dss[i], err)
|
||||
return
|
||||
}
|
||||
}
|
||||
|
||||
takeOverTasks(ctx, ident, dss[i], ids)
|
||||
}
|
||||
}
|
||||
|
||||
func takeOverTasks(ctx *ctx.Context, alive, dead string, ids []int64) {
|
||||
count := len(ids)
|
||||
for i := 0; i < count; i++ {
|
||||
success, err := models.TakeOverTask(ctx, ids[i], dead, alive)
|
||||
if err != nil {
|
||||
logger.Errorf("cannot take over task: %v", err)
|
||||
return
|
||||
}
|
||||
|
||||
if success {
|
||||
logger.Infof("%s take over task[%d] of %s", alive, ids[i], dead)
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -1,53 +0,0 @@
|
||||
package timer
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
"time"
|
||||
|
||||
"github.com/toolkits/pkg/logger"
|
||||
)
|
||||
|
||||
// CacheHostDoing 缓存task_host_doing表全部内容,减轻DB压力
|
||||
func CacheHostDoing(ctx *ctx.Context) {
|
||||
if err := cacheHostDoing(ctx); err != nil {
|
||||
fmt.Println("cannot cache task_host_doing data: ", err)
|
||||
}
|
||||
go loopCacheHostDoing(ctx)
|
||||
}
|
||||
|
||||
func loopCacheHostDoing(ctx *ctx.Context) {
|
||||
for {
|
||||
time.Sleep(time.Millisecond * 400)
|
||||
if err := cacheHostDoing(ctx); err != nil {
|
||||
logger.Warning("cannot cache task_host_doing data: ", err)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func cacheHostDoing(ctx *ctx.Context) error {
|
||||
doingsFromDb, err := models.TableRecordGets[[]models.TaskHostDoing](ctx, models.TaskHostDoing{}.TableName(), "")
|
||||
if err != nil {
|
||||
logger.Errorf("models.TableRecordGets fail: %v", err)
|
||||
}
|
||||
|
||||
doingsFromRedis, err := models.CacheRecordGets[models.TaskHostDoing](ctx)
|
||||
if err != nil {
|
||||
logger.Errorf("models.CacheRecordGets fail: %v", err)
|
||||
}
|
||||
|
||||
set := make(map[string][]models.TaskHostDoing)
|
||||
for _, doing := range doingsFromDb {
|
||||
doing.AlertTriggered = false
|
||||
set[doing.Host] = append(set[doing.Host], doing)
|
||||
}
|
||||
for _, doing := range doingsFromRedis {
|
||||
doing.AlertTriggered = true
|
||||
set[doing.Host] = append(set[doing.Host], doing)
|
||||
}
|
||||
|
||||
models.SetDoingCache(set)
|
||||
|
||||
return err
|
||||
}
|
||||
@@ -1,27 +0,0 @@
|
||||
package timer
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
"time"
|
||||
|
||||
"github.com/toolkits/pkg/logger"
|
||||
)
|
||||
|
||||
func ReportResult(ctx *ctx.Context) {
|
||||
if err := models.ReportCacheResult(ctx); err != nil {
|
||||
fmt.Println("cannot report task_host result from alter trigger: ", err)
|
||||
}
|
||||
go loopReport(ctx)
|
||||
}
|
||||
|
||||
func loopReport(ctx *ctx.Context) {
|
||||
d := time.Duration(2) * time.Second
|
||||
for {
|
||||
time.Sleep(d)
|
||||
if err := models.ReportCacheResult(ctx); err != nil {
|
||||
logger.Warning("cannot report task_host result from alter trigger: ", err)
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -1,79 +0,0 @@
|
||||
package timer
|
||||
|
||||
import (
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
"time"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/ibex/server/config"
|
||||
"github.com/ccfos/nightingale/v6/ibex/server/logic"
|
||||
|
||||
"github.com/toolkits/pkg/logger"
|
||||
)
|
||||
|
||||
func Schedule(ctx *ctx.Context) {
|
||||
for {
|
||||
scheduleOrphan(ctx)
|
||||
scheduleMine(ctx)
|
||||
time.Sleep(time.Second)
|
||||
}
|
||||
}
|
||||
|
||||
func scheduleMine(ctx *ctx.Context) {
|
||||
ids, err := models.TasksOfScheduler(ctx, config.C.Heartbeat.LocalAddr)
|
||||
if err != nil {
|
||||
logger.Errorf("cannot get tasks of scheduler(%s): %v", config.C.Heartbeat.LocalAddr, err)
|
||||
return
|
||||
}
|
||||
|
||||
count := len(ids)
|
||||
for i := 0; i < count; i++ {
|
||||
logic.CheckTimeout(ctx, ids[i])
|
||||
logic.ScheduleTask(ctx, ids[i])
|
||||
}
|
||||
}
|
||||
|
||||
func scheduleOrphan(ctx *ctx.Context) {
|
||||
ids, err := models.OrphanTaskIds(ctx)
|
||||
if err != nil {
|
||||
logger.Errorf("cannot get orphan task ids: %v", err)
|
||||
return
|
||||
}
|
||||
|
||||
count := len(ids)
|
||||
if count == 0 {
|
||||
return
|
||||
}
|
||||
|
||||
logger.Debug("orphan task ids:", ids)
|
||||
|
||||
for i := 0; i < count; i++ {
|
||||
action, err := models.TaskActionGet(ctx, "id=?", ids[i])
|
||||
if err != nil {
|
||||
logger.Errorf("cannot get task[%d] action: %v", ids[i], err)
|
||||
continue
|
||||
}
|
||||
|
||||
if action == nil {
|
||||
continue
|
||||
}
|
||||
|
||||
if action.Action == "pause" {
|
||||
continue
|
||||
}
|
||||
|
||||
mine, err := models.TakeOverTask(ctx, ids[i], "", config.C.Heartbeat.LocalAddr)
|
||||
if err != nil {
|
||||
logger.Errorf("cannot take over task[%d]: %v", ids[i], err)
|
||||
continue
|
||||
}
|
||||
|
||||
if !mine {
|
||||
continue
|
||||
}
|
||||
|
||||
logger.Debugf("task[%d] is mine", ids[i])
|
||||
|
||||
logic.ScheduleTask(ctx, ids[i])
|
||||
}
|
||||
}
|
||||
@@ -1,38 +0,0 @@
|
||||
package timer
|
||||
|
||||
import (
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
"time"
|
||||
|
||||
"github.com/toolkits/pkg/logger"
|
||||
)
|
||||
|
||||
func CleanLong(ctx *ctx.Context) {
|
||||
d := time.Duration(24) * time.Hour
|
||||
for {
|
||||
cleanLongTask(ctx)
|
||||
time.Sleep(d)
|
||||
}
|
||||
}
|
||||
|
||||
func cleanLongTask(ctx *ctx.Context) {
|
||||
ids, err := models.LongTaskIds(ctx)
|
||||
if err != nil {
|
||||
logger.Error("LongTaskIds:", err)
|
||||
return
|
||||
}
|
||||
|
||||
if ids == nil {
|
||||
return
|
||||
}
|
||||
|
||||
count := len(ids)
|
||||
for i := 0; i < count; i++ {
|
||||
action := models.TaskAction{Id: ids[i]}
|
||||
err = action.Update(ctx, "cancel")
|
||||
if err != nil {
|
||||
logger.Errorf("cannot cancel long task[%d]: %v", ids[i], err)
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -1,33 +0,0 @@
|
||||
package types
|
||||
|
||||
type TaskMetaResponse struct {
|
||||
Message string
|
||||
Script string
|
||||
Args string
|
||||
Account string
|
||||
Stdin string
|
||||
}
|
||||
|
||||
type ReportTask struct {
|
||||
Id int64
|
||||
Clock int64
|
||||
Status string
|
||||
Stdout string
|
||||
Stderr string
|
||||
}
|
||||
|
||||
type ReportRequest struct {
|
||||
Ident string
|
||||
ReportTasks []ReportTask
|
||||
}
|
||||
|
||||
type AssignTask struct {
|
||||
Id int64
|
||||
Clock int64
|
||||
Action string
|
||||
}
|
||||
|
||||
type ReportResponse struct {
|
||||
Message string
|
||||
AssignTasks []AssignTask
|
||||
}
|
||||
@@ -192,7 +192,7 @@
|
||||
"prom_ql": "",
|
||||
"queries": [
|
||||
{
|
||||
"prom_ql": "elasticsearch_filesystem_data_available_bytes / elasticsearch_filesystem_data_size_in_bytes * 100 \u003c 10",
|
||||
"prom_ql": "elasticsearch_filesystem_data_available_bytes / elasticsearch_filesystem_data_size_bytes * 100 \u003c 10",
|
||||
"severity": 1
|
||||
}
|
||||
],
|
||||
@@ -275,7 +275,7 @@
|
||||
"prom_ql": "",
|
||||
"queries": [
|
||||
{
|
||||
"prom_ql": "elasticsearch_filesystem_data_available_bytes / elasticsearch_filesystem_data_size_in_bytes * 100 \u003c 20",
|
||||
"prom_ql": "elasticsearch_filesystem_data_available_bytes / elasticsearch_filesystem_data_size_bytes * 100 \u003c 20",
|
||||
"severity": 2
|
||||
}
|
||||
],
|
||||
@@ -1078,4 +1078,4 @@
|
||||
"update_by": "",
|
||||
"uuid": 1717556327360313000
|
||||
}
|
||||
]
|
||||
]
|
||||
@@ -4,6 +4,7 @@ ElasticSearch 通过 HTTP JSON 的方式暴露了自身的监控指标,通过
|
||||
|
||||
如果是小规模集群,设置 `local=false`,从集群中某一个节点抓取数据,即可拿到整个集群所有节点的监控数据。如果是大规模集群,建议设置 `local=true`,在集群的每个节点上都部署抓取器,抓取本地 elasticsearch 进程的监控数据。
|
||||
|
||||
ElasticSearch 详细的监控讲解,请参考这篇 [文章](https://time.geekbang.org/column/article/628847)。
|
||||
|
||||
## 配置示例
|
||||
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# kafka plugin
|
||||
|
||||
Kafka 的核心指标,其实都是通过 JMX 的方式暴露的。对于 JMX 暴露的指标,使用 jolokia 或者使用 jmx_exporter 那个 jar 包来采集即可,不需要本插件。
|
||||
Kafka 的核心指标,其实都是通过 JMX 的方式暴露的,可以参考这篇 [文章](https://time.geekbang.org/column/article/628498)。对于 JMX 暴露的指标,使用 jolokia 或者使用 jmx_exporter 那个 jar 包来采集即可,不需要本插件。
|
||||
|
||||
本插件主要是采集的消费者延迟数据,这个数据无法通过 Kafka 服务端的 JMX 拿到。
|
||||
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# Kubernetes
|
||||
|
||||
这个插件已经废弃。Kubernetes 监控系列可以参考这个 [文章](https://flashcat.cloud/categories/kubernetes%E7%9B%91%E6%8E%A7%E4%B8%93%E6%A0%8F/)。
|
||||
这个插件已经废弃。Kubernetes 监控系列可以参考这个 [文章](https://flashcat.cloud/categories/kubernetes%E7%9B%91%E6%8E%A7%E4%B8%93%E6%A0%8F/)。或者参考 [专栏](https://time.geekbang.org/column/article/630306)。
|
||||
|
||||
不过 Kubernetes 这个类别下的内置告警规则和内置仪表盘都是可以使用的。
|
||||
|
||||
|
||||
@@ -2051,7 +2051,7 @@
|
||||
"cate": "prometheus",
|
||||
"value": "${prom}"
|
||||
},
|
||||
"definition": "label_values(node_uname_info, instance)",
|
||||
"definition": "label_values(node_cpu_seconds_total, instance)",
|
||||
"name": "node",
|
||||
"selected": "$node",
|
||||
"type": "query"
|
||||
|
||||
@@ -252,7 +252,7 @@
|
||||
"cate": "prometheus",
|
||||
"value": "${prom}"
|
||||
},
|
||||
"definition": "label_values(node_uname_info, instance)",
|
||||
"definition": "label_values(node_cpu_seconds_total, instance)",
|
||||
"name": "node",
|
||||
"selected": "$node",
|
||||
"type": "query"
|
||||
|
||||
251
integrations/Process/alerts/process_by_exporter.json
Normal file
251
integrations/Process/alerts/process_by_exporter.json
Normal file
@@ -0,0 +1,251 @@
|
||||
[
|
||||
{
|
||||
"id": 0,
|
||||
"group_id": 0,
|
||||
"cate": "prometheus",
|
||||
"datasource_ids": [
|
||||
0
|
||||
],
|
||||
"cluster": "",
|
||||
"name": "Process X high number of open files - exporter",
|
||||
"note": "",
|
||||
"prod": "metric",
|
||||
"algorithm": "",
|
||||
"algo_params": null,
|
||||
"delay": 0,
|
||||
"severity": 2,
|
||||
"severities": [
|
||||
2
|
||||
],
|
||||
"disabled": 1,
|
||||
"prom_for_duration": 60,
|
||||
"prom_ql": "",
|
||||
"rule_config": {
|
||||
"algo_params": null,
|
||||
"inhibit": false,
|
||||
"prom_ql": "",
|
||||
"queries": [
|
||||
{
|
||||
"prom_ql": "avg by (instance) (namedprocess_namegroup_worst_fd_ratio{groupname=\"X\"}) * 100 \u003e 80",
|
||||
"severity": 2
|
||||
}
|
||||
],
|
||||
"severity": 0
|
||||
},
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_stimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_etime": "23:59",
|
||||
"enable_etimes": [
|
||||
"23:59"
|
||||
],
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_days_of_weeks": [
|
||||
[
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
]
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_groups_obj": null,
|
||||
"notify_groups": null,
|
||||
"notify_repeat_step": 60,
|
||||
"notify_max_number": 0,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": [
|
||||
"alertname=ProcessHighOpenFiles"
|
||||
],
|
||||
"annotations": null,
|
||||
"extra_config": null,
|
||||
"create_at": 0,
|
||||
"create_by": "",
|
||||
"update_at": 0,
|
||||
"update_by": "",
|
||||
"uuid": 1717556328248638000
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"group_id": 0,
|
||||
"cate": "prometheus",
|
||||
"datasource_ids": [
|
||||
0
|
||||
],
|
||||
"cluster": "",
|
||||
"name": "Process X is down - exporter",
|
||||
"note": "",
|
||||
"prod": "metric",
|
||||
"algorithm": "",
|
||||
"algo_params": null,
|
||||
"delay": 0,
|
||||
"severity": 1,
|
||||
"severities": [
|
||||
1
|
||||
],
|
||||
"disabled": 1,
|
||||
"prom_for_duration": 0,
|
||||
"prom_ql": "",
|
||||
"rule_config": {
|
||||
"algo_params": null,
|
||||
"inhibit": false,
|
||||
"prom_ql": "",
|
||||
"queries": [
|
||||
{
|
||||
"prom_ql": "sum by (instance) (namedprocess_namegroup_num_procs{groupname=\"X\"}) == 0",
|
||||
"severity": 1
|
||||
}
|
||||
],
|
||||
"severity": 0
|
||||
},
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_stimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_etime": "23:59",
|
||||
"enable_etimes": [
|
||||
"23:59"
|
||||
],
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_days_of_weeks": [
|
||||
[
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
]
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_groups_obj": null,
|
||||
"notify_groups": null,
|
||||
"notify_repeat_step": 60,
|
||||
"notify_max_number": 0,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": [
|
||||
"alertname=ProcessNotRunning"
|
||||
],
|
||||
"annotations": null,
|
||||
"extra_config": null,
|
||||
"create_at": 0,
|
||||
"create_by": "",
|
||||
"update_at": 0,
|
||||
"update_by": "",
|
||||
"uuid": 1717556328249307000
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"group_id": 0,
|
||||
"cate": "prometheus",
|
||||
"datasource_ids": [
|
||||
0
|
||||
],
|
||||
"cluster": "",
|
||||
"name": "Process X is restarted - exporter",
|
||||
"note": "",
|
||||
"prod": "metric",
|
||||
"algorithm": "",
|
||||
"algo_params": null,
|
||||
"delay": 0,
|
||||
"severity": 3,
|
||||
"severities": [
|
||||
3
|
||||
],
|
||||
"disabled": 1,
|
||||
"prom_for_duration": 0,
|
||||
"prom_ql": "",
|
||||
"rule_config": {
|
||||
"algo_params": null,
|
||||
"inhibit": false,
|
||||
"prom_ql": "",
|
||||
"queries": [
|
||||
{
|
||||
"prom_ql": "namedprocess_namegroup_oldest_start_time_seconds{groupname=\"X\"} \u003e time() - 60 ",
|
||||
"severity": 3
|
||||
}
|
||||
],
|
||||
"severity": 0
|
||||
},
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_stimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_etime": "23:59",
|
||||
"enable_etimes": [
|
||||
"23:59"
|
||||
],
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_days_of_weeks": [
|
||||
[
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
]
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [],
|
||||
"notify_groups_obj": null,
|
||||
"notify_groups": null,
|
||||
"notify_repeat_step": 60,
|
||||
"notify_max_number": 0,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": [
|
||||
"alertname=ProcessRestarted"
|
||||
],
|
||||
"annotations": null,
|
||||
"extra_config": null,
|
||||
"create_at": 0,
|
||||
"create_by": "",
|
||||
"update_at": 0,
|
||||
"update_by": "",
|
||||
"uuid": 1717556328249744000
|
||||
}
|
||||
]
|
||||
172
integrations/Process/alerts/procstat_by_categraf.json
Normal file
172
integrations/Process/alerts/procstat_by_categraf.json
Normal file
@@ -0,0 +1,172 @@
|
||||
[
|
||||
{
|
||||
"id": 0,
|
||||
"group_id": 0,
|
||||
"cate": "prometheus",
|
||||
"datasource_ids": [
|
||||
0
|
||||
],
|
||||
"cluster": "",
|
||||
"name": "process handle limit is too low",
|
||||
"note": "",
|
||||
"prod": "metric",
|
||||
"algorithm": "",
|
||||
"algo_params": null,
|
||||
"delay": 0,
|
||||
"severity": 3,
|
||||
"severities": [
|
||||
3
|
||||
],
|
||||
"disabled": 1,
|
||||
"prom_for_duration": 60,
|
||||
"prom_ql": "",
|
||||
"rule_config": {
|
||||
"algo_params": null,
|
||||
"inhibit": false,
|
||||
"prom_ql": "",
|
||||
"queries": [
|
||||
{
|
||||
"prom_ql": "procstat_rlimit_num_fds_soft \u003c 2048",
|
||||
"severity": 3
|
||||
}
|
||||
],
|
||||
"severity": 0
|
||||
},
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_stimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_etime": "23:59",
|
||||
"enable_etimes": [
|
||||
"23:59"
|
||||
],
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_days_of_weeks": [
|
||||
[
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
]
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [
|
||||
"email",
|
||||
"dingtalk",
|
||||
"wecom"
|
||||
],
|
||||
"notify_groups_obj": null,
|
||||
"notify_groups": null,
|
||||
"notify_repeat_step": 60,
|
||||
"notify_max_number": 0,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": [],
|
||||
"annotations": null,
|
||||
"extra_config": null,
|
||||
"create_at": 0,
|
||||
"create_by": "",
|
||||
"update_at": 0,
|
||||
"update_by": "",
|
||||
"uuid": 1717556328251224000
|
||||
},
|
||||
{
|
||||
"id": 0,
|
||||
"group_id": 0,
|
||||
"cate": "prometheus",
|
||||
"datasource_ids": [
|
||||
0
|
||||
],
|
||||
"cluster": "",
|
||||
"name": "there is a process count of 0, indicating that a certain process may have crashed",
|
||||
"note": "",
|
||||
"prod": "metric",
|
||||
"algorithm": "",
|
||||
"algo_params": null,
|
||||
"delay": 0,
|
||||
"severity": 1,
|
||||
"severities": [
|
||||
1
|
||||
],
|
||||
"disabled": 1,
|
||||
"prom_for_duration": 60,
|
||||
"prom_ql": "",
|
||||
"rule_config": {
|
||||
"algo_params": null,
|
||||
"inhibit": false,
|
||||
"prom_ql": "",
|
||||
"queries": [
|
||||
{
|
||||
"prom_ql": "procstat_lookup_count == 0",
|
||||
"severity": 1
|
||||
}
|
||||
],
|
||||
"severity": 0
|
||||
},
|
||||
"prom_eval_interval": 15,
|
||||
"enable_stime": "00:00",
|
||||
"enable_stimes": [
|
||||
"00:00"
|
||||
],
|
||||
"enable_etime": "23:59",
|
||||
"enable_etimes": [
|
||||
"23:59"
|
||||
],
|
||||
"enable_days_of_week": [
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
],
|
||||
"enable_days_of_weeks": [
|
||||
[
|
||||
"1",
|
||||
"2",
|
||||
"3",
|
||||
"4",
|
||||
"5",
|
||||
"6",
|
||||
"0"
|
||||
]
|
||||
],
|
||||
"enable_in_bg": 0,
|
||||
"notify_recovered": 1,
|
||||
"notify_channels": [
|
||||
"email",
|
||||
"dingtalk",
|
||||
"wecom"
|
||||
],
|
||||
"notify_groups_obj": null,
|
||||
"notify_groups": null,
|
||||
"notify_repeat_step": 60,
|
||||
"notify_max_number": 0,
|
||||
"recover_duration": 0,
|
||||
"callbacks": [],
|
||||
"runbook_url": "",
|
||||
"append_tags": [],
|
||||
"annotations": null,
|
||||
"extra_config": null,
|
||||
"create_at": 0,
|
||||
"create_by": "",
|
||||
"update_at": 0,
|
||||
"update_by": "",
|
||||
"uuid": 1717556328254260000
|
||||
}
|
||||
]
|
||||
1158
integrations/Process/dashboards/process_by_exporter.json
Normal file
1158
integrations/Process/dashboards/process_by_exporter.json
Normal file
File diff suppressed because it is too large
Load Diff
BIN
integrations/Process/icon/process.png
Normal file
BIN
integrations/Process/icon/process.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 2.2 KiB |
File diff suppressed because it is too large
Load Diff
@@ -2085,6 +2085,7 @@
|
||||
],
|
||||
"var": [
|
||||
{
|
||||
"defaultValue": 2,
|
||||
"definition": "prometheus",
|
||||
"label": "datasource",
|
||||
"name": "datasource",
|
||||
|
||||
@@ -305,12 +305,8 @@ func (e *AlertCurEvent) DB2FE() error {
|
||||
e.CallbacksJSON = strings.Fields(e.Callbacks)
|
||||
e.TagsJSON = strings.Split(e.Tags, ",,")
|
||||
e.OriginalTagsJSON = strings.Split(e.OriginalTags, ",,")
|
||||
if err := json.Unmarshal([]byte(e.Annotations), &e.AnnotationsJSON); err != nil {
|
||||
return err
|
||||
}
|
||||
if err := json.Unmarshal([]byte(e.RuleConfig), &e.RuleConfigJson); err != nil {
|
||||
return err
|
||||
}
|
||||
json.Unmarshal([]byte(e.Annotations), &e.AnnotationsJSON)
|
||||
json.Unmarshal([]byte(e.RuleConfig), &e.RuleConfigJson)
|
||||
return nil
|
||||
}
|
||||
|
||||
@@ -348,39 +344,6 @@ func (e *AlertCurEvent) DB2Mem() {
|
||||
|
||||
e.TagsMap[arr[0]] = arr[1]
|
||||
}
|
||||
|
||||
// 解决之前数据库中 FirstTriggerTime 为 0 的情况
|
||||
if e.FirstTriggerTime == 0 {
|
||||
e.FirstTriggerTime = e.TriggerTime
|
||||
}
|
||||
}
|
||||
|
||||
func FillRuleConfigTplName(ctx *ctx.Context, ruleConfig string) (interface{}, bool) {
|
||||
var config RuleConfig
|
||||
err := json.Unmarshal([]byte(ruleConfig), &config)
|
||||
if err != nil {
|
||||
logger.Warningf("failed to unmarshal rule config: %v", err)
|
||||
return nil, false
|
||||
}
|
||||
|
||||
if len(config.TaskTpls) == 0 {
|
||||
return nil, false
|
||||
}
|
||||
|
||||
for i := 0; i < len(config.TaskTpls); i++ {
|
||||
tpl, err := TaskTplGetById(ctx, config.TaskTpls[i].TplId)
|
||||
if err != nil {
|
||||
logger.Warningf("failed to get task tpl by id:%d, %v", config.TaskTpls[i].TplId, err)
|
||||
return nil, false
|
||||
}
|
||||
|
||||
if tpl == nil {
|
||||
logger.Warningf("task tpl not found by id:%d", config.TaskTpls[i].TplId)
|
||||
return nil, false
|
||||
}
|
||||
config.TaskTpls[i].TplName = tpl.Title
|
||||
}
|
||||
return config, true
|
||||
}
|
||||
|
||||
// for webui
|
||||
@@ -508,11 +471,6 @@ func AlertCurEventDel(ctx *ctx.Context, ids []int64) error {
|
||||
}
|
||||
|
||||
func AlertCurEventDelByHash(ctx *ctx.Context, hash string) error {
|
||||
if !ctx.IsCenter {
|
||||
_, err := poster.GetByUrls[string](ctx, "/v1/n9e/alert-cur-events-del-by-hash?hash="+hash)
|
||||
return err
|
||||
}
|
||||
|
||||
return DB(ctx).Where("hash = ?", hash).Delete(&AlertCurEvent{}).Error
|
||||
}
|
||||
|
||||
@@ -631,8 +589,8 @@ func AlertCurEventGetMap(ctx *ctx.Context, cluster string) (map[int64]map[string
|
||||
return ret, nil
|
||||
}
|
||||
|
||||
func (e *AlertCurEvent) UpdateFieldsMap(ctx *ctx.Context, fields map[string]interface{}) error {
|
||||
return DB(ctx).Model(e).Updates(fields).Error
|
||||
func (m *AlertCurEvent) UpdateFieldsMap(ctx *ctx.Context, fields map[string]interface{}) error {
|
||||
return DB(ctx).Model(m).Updates(fields).Error
|
||||
}
|
||||
|
||||
func AlertCurEventUpgradeToV6(ctx *ctx.Context, dsm map[string]Datasource) error {
|
||||
|
||||
@@ -117,10 +117,6 @@ func (e *AlertHisEvent) FillNotifyGroups(ctx *ctx.Context, cache map[int64]*User
|
||||
return nil
|
||||
}
|
||||
|
||||
// func (e *AlertHisEvent) FillTaskTplName(ctx *ctx.Context, cache map[int64]*UserGroup) error {
|
||||
|
||||
// }
|
||||
|
||||
func AlertHisEventTotal(ctx *ctx.Context, prods []string, bgids []int64, stime, etime int64, severity int, recovered int, dsIds []int64, cates []string, query string) (int64, error) {
|
||||
session := DB(ctx).Model(&AlertHisEvent{}).Where("last_eval_time between ? and ?", stime, etime)
|
||||
|
||||
|
||||
@@ -11,7 +11,6 @@ import (
|
||||
"github.com/ccfos/nightingale/v6/pkg/poster"
|
||||
"github.com/ccfos/nightingale/v6/pushgw/pconf"
|
||||
|
||||
"github.com/jinzhu/copier"
|
||||
"github.com/pkg/errors"
|
||||
"github.com/toolkits/pkg/logger"
|
||||
"github.com/toolkits/pkg/str"
|
||||
@@ -27,21 +26,6 @@ const (
|
||||
TDENGINE = "tdengine"
|
||||
)
|
||||
|
||||
const (
|
||||
AlertRuleEnabled = 0
|
||||
AlertRuleDisabled = 1
|
||||
|
||||
AlertRuleEnableInGlobalBG = 0
|
||||
AlertRuleEnableInOneBG = 1
|
||||
|
||||
AlertRuleNotNotifyRecovered = 0
|
||||
AlertRuleNotifyRecovered = 1
|
||||
|
||||
AlertRuleNotifyRepeatStep60Min = 60
|
||||
|
||||
AlertRuleRecoverDuration0Sec = 0
|
||||
)
|
||||
|
||||
type AlertRule struct {
|
||||
Id int64 `json:"id" gorm:"primaryKey"`
|
||||
GroupId int64 `json:"group_id"` // busi group id
|
||||
@@ -100,24 +84,6 @@ type AlertRule struct {
|
||||
UUID int64 `json:"uuid" gorm:"-"` // tpl identifier
|
||||
}
|
||||
|
||||
type Tpl struct {
|
||||
TplId int64 `json:"tpl_id"`
|
||||
TplName string `json:"tpl_name"`
|
||||
Host []string `json:"host"`
|
||||
}
|
||||
|
||||
type RuleConfig struct {
|
||||
Version string `json:"version,omitempty"`
|
||||
EventRelabelConfig []*pconf.RelabelConfig `json:"event_relabel_config,omitempty"`
|
||||
TaskTpls []*Tpl `json:"task_tpls,omitempty"`
|
||||
Queries interface{} `json:"queries,omitempty"`
|
||||
Triggers []Trigger `json:"triggers,omitempty"`
|
||||
Inhibit bool `json:"inhibit,omitempty"`
|
||||
PromQl string `json:"prom_ql,omitempty"`
|
||||
Severity int `json:"severity,omitempty"`
|
||||
AlgoParams interface{} `json:"algo_params,omitempty"`
|
||||
}
|
||||
|
||||
type PromRuleConfig struct {
|
||||
Queries []PromQuery `json:"queries"`
|
||||
Inhibit bool `json:"inhibit"`
|
||||
@@ -156,16 +122,6 @@ type Trigger struct {
|
||||
Mode int `json:"mode"`
|
||||
Exp string `json:"exp"`
|
||||
Severity int `json:"severity"`
|
||||
|
||||
Type string `json:"type,omitempty"`
|
||||
Duration int `json:"duration,omitempty"`
|
||||
Percent int `json:"percent,omitempty"`
|
||||
Joins []Join `json:"joins"`
|
||||
}
|
||||
|
||||
type Join struct {
|
||||
JoinType string `json:"join_type"`
|
||||
On []string `json:"on"`
|
||||
}
|
||||
|
||||
func GetHostsQuery(queries []HostQuery) []map[string]interface{} {
|
||||
@@ -192,14 +148,12 @@ func GetHostsQuery(queries []HostQuery) []map[string]interface{} {
|
||||
blank := " "
|
||||
for _, tag := range lst {
|
||||
m["tags like ?"+blank] = "%" + tag + "%"
|
||||
m["host_tags like ?"+blank] = "%" + tag + "%"
|
||||
blank += " "
|
||||
}
|
||||
} else {
|
||||
blank := " "
|
||||
for _, tag := range lst {
|
||||
m["tags not like ?"+blank] = "%" + tag + "%"
|
||||
m["host_tags not like ?"+blank] = "%" + tag + "%"
|
||||
blank += " "
|
||||
}
|
||||
}
|
||||
@@ -468,19 +422,6 @@ func (ar *AlertRule) UpdateColumn(ctx *ctx.Context, column string, value interfa
|
||||
return DB(ctx).Model(ar).UpdateColumn("annotations", string(b)).Error
|
||||
}
|
||||
|
||||
if column == "annotations" {
|
||||
newAnnotations := value.(map[string]interface{})
|
||||
ar.AnnotationsJSON = make(map[string]string)
|
||||
for k, v := range newAnnotations {
|
||||
ar.AnnotationsJSON[k] = v.(string)
|
||||
}
|
||||
b, err := json.Marshal(ar.AnnotationsJSON)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
return DB(ctx).Model(ar).UpdateColumn("annotations", string(b)).Error
|
||||
}
|
||||
|
||||
return DB(ctx).Model(ar).UpdateColumn(column, value).Error
|
||||
}
|
||||
|
||||
@@ -738,25 +679,6 @@ func AlertRuleExists(ctx *ctx.Context, id, groupId int64, datasourceIds []int64,
|
||||
return false, nil
|
||||
}
|
||||
|
||||
func GetAlertRuleIdsByTaskId(ctx *ctx.Context, taskId int64) ([]int64, error) {
|
||||
tpl := "%\"tpl_id\":" + fmt.Sprint(taskId) + "}%"
|
||||
cb := "{ibex}/" + fmt.Sprint(taskId) + "%"
|
||||
session := DB(ctx).Where("rule_config like ? or callbacks like ?", tpl, cb)
|
||||
|
||||
var lst []AlertRule
|
||||
var ids []int64
|
||||
err := session.Find(&lst).Error
|
||||
if err != nil || len(lst) == 0 {
|
||||
return ids, err
|
||||
}
|
||||
|
||||
for i := 0; i < len(lst); i++ {
|
||||
ids = append(ids, lst[i].Id)
|
||||
}
|
||||
|
||||
return ids, nil
|
||||
}
|
||||
|
||||
func AlertRuleGets(ctx *ctx.Context, groupId int64) ([]AlertRule, error) {
|
||||
session := DB(ctx).Where("group_id=?", groupId).Order("name")
|
||||
|
||||
@@ -1089,19 +1011,3 @@ func GetTargetsOfHostAlertRule(ctx *ctx.Context, engineName string) (map[string]
|
||||
|
||||
return m, nil
|
||||
}
|
||||
|
||||
func (ar *AlertRule) Copy(ctx *ctx.Context) (*AlertRule, error) {
|
||||
newAr := &AlertRule{}
|
||||
err := copier.Copy(newAr, ar)
|
||||
if err != nil {
|
||||
logger.Errorf("copy alert rule failed, %v", err)
|
||||
}
|
||||
return newAr, err
|
||||
}
|
||||
|
||||
func InsertAlertRule(ctx *ctx.Context, ars []*AlertRule) error {
|
||||
if len(ars) == 0 {
|
||||
return nil
|
||||
}
|
||||
return DB(ctx).Create(ars).Error
|
||||
}
|
||||
|
||||
@@ -17,7 +17,6 @@ type HostMeta struct {
|
||||
EngineName string `json:"engine_name"`
|
||||
GlobalLabels map[string]string `json:"global_labels"`
|
||||
ExtendInfo map[string]interface{} `json:"extend_info"`
|
||||
Config interface{} `json:"config"`
|
||||
}
|
||||
|
||||
type HostUpdteTime struct {
|
||||
|
||||
@@ -1,69 +0,0 @@
|
||||
package models
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/ibex/server/config"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
"github.com/ccfos/nightingale/v6/pkg/poster"
|
||||
|
||||
"gorm.io/gorm"
|
||||
)
|
||||
|
||||
func IbexCount(tx *gorm.DB) (int64, error) {
|
||||
var cnt int64
|
||||
err := tx.Count(&cnt).Error
|
||||
return cnt, err
|
||||
}
|
||||
|
||||
func tht(id int64) string {
|
||||
return fmt.Sprintf("task_host_%d", id%100)
|
||||
}
|
||||
|
||||
func TableRecordGets[T any](ctx *ctx.Context, table, where string, args ...interface{}) (lst T, err error) {
|
||||
if config.C.IsCenter {
|
||||
if where == "" || len(args) == 0 {
|
||||
err = DB(ctx).Table(table).Find(&lst).Error
|
||||
} else {
|
||||
err = DB(ctx).Table(table).Where(where, args...).Find(&lst).Error
|
||||
}
|
||||
return
|
||||
}
|
||||
|
||||
return poster.PostByUrlsWithResp[T](ctx, "/ibex/v1/table/record/list", map[string]interface{}{
|
||||
"table": table,
|
||||
"where": where,
|
||||
"args": args,
|
||||
})
|
||||
}
|
||||
|
||||
func TableRecordCount(ctx *ctx.Context, table, where string, args ...interface{}) (int64, error) {
|
||||
if config.C.IsCenter {
|
||||
if where == "" || len(args) == 0 {
|
||||
return IbexCount(DB(ctx).Table(table))
|
||||
}
|
||||
return IbexCount(DB(ctx).Table(table).Where(where, args...))
|
||||
}
|
||||
|
||||
return poster.PostByUrlsWithResp[int64](ctx, "/ibex/v1/table/record/count", map[string]interface{}{
|
||||
"table": table,
|
||||
"where": where,
|
||||
"args": args,
|
||||
})
|
||||
}
|
||||
|
||||
var IBEX_HOST_DOING = "ibex-host-doing"
|
||||
|
||||
func CacheRecordGets[T any](ctx *ctx.Context) ([]T, error) {
|
||||
lst := make([]T, 0)
|
||||
values, _ := ctx.Redis.HVals(ctx.Ctx, IBEX_HOST_DOING).Result()
|
||||
for _, val := range values {
|
||||
t := new(T)
|
||||
if err := json.Unmarshal([]byte(val), t); err != nil {
|
||||
return nil, err
|
||||
}
|
||||
lst = append(lst, *t)
|
||||
}
|
||||
return lst, nil
|
||||
}
|
||||
@@ -1,112 +0,0 @@
|
||||
package models
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"time"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
|
||||
"gorm.io/gorm"
|
||||
)
|
||||
|
||||
type TaskAction struct {
|
||||
Id int64 `gorm:"column:id;primaryKey"`
|
||||
Action string `gorm:"column:action;size:32;not null"`
|
||||
Clock int64 `gorm:"column:clock;not null;default:0"`
|
||||
}
|
||||
|
||||
func (TaskAction) TableName() string {
|
||||
return "task_action"
|
||||
}
|
||||
|
||||
func TaskActionGet(ctx *ctx.Context, where string, args ...interface{}) (*TaskAction, error) {
|
||||
var obj TaskAction
|
||||
ret := DB(ctx).Where(where, args...).Find(&obj)
|
||||
if ret.Error != nil {
|
||||
return nil, ret.Error
|
||||
}
|
||||
|
||||
if ret.RowsAffected == 0 {
|
||||
return nil, nil
|
||||
}
|
||||
|
||||
return &obj, nil
|
||||
}
|
||||
|
||||
func TaskActionExistsIds(ctx *ctx.Context, ids []int64) ([]int64, error) {
|
||||
if len(ids) == 0 {
|
||||
return ids, nil
|
||||
}
|
||||
|
||||
var ret []int64
|
||||
err := DB(ctx).Model(&TaskAction{}).Where("id in ?", ids).Pluck("id", &ret).Error
|
||||
return ret, err
|
||||
}
|
||||
|
||||
func CancelWaitingHosts(ctx *ctx.Context, id int64) error {
|
||||
return DB(ctx).Table(tht(id)).Where("id = ? and status = ?", id, "waiting").Update("status", "cancelled").Error
|
||||
}
|
||||
|
||||
func StartTask(ctx *ctx.Context, id int64) error {
|
||||
return DB(ctx).Model(&TaskScheduler{}).Where("id = ?", id).Update("scheduler", "").Error
|
||||
}
|
||||
|
||||
func CancelTask(ctx *ctx.Context, id int64) error {
|
||||
return CancelWaitingHosts(ctx, id)
|
||||
}
|
||||
|
||||
func KillTask(ctx *ctx.Context, id int64) error {
|
||||
if err := CancelWaitingHosts(ctx, id); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
now := time.Now().Unix()
|
||||
|
||||
return DB(ctx).Transaction(func(tx *gorm.DB) error {
|
||||
err := tx.Model(&TaskHostDoing{}).Where("id = ? and action <> ?", id, "kill").Updates(map[string]interface{}{
|
||||
"clock": now,
|
||||
"action": "kill",
|
||||
}).Error
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
return tx.Table(tht(id)).Where("id = ? and status = ?", id, "running").Update("status", "killing").Error
|
||||
})
|
||||
}
|
||||
|
||||
func (a *TaskAction) Update(ctx *ctx.Context, action string) error {
|
||||
if !(action == "start" || action == "cancel" || action == "kill" || action == "pause") {
|
||||
return fmt.Errorf("action invalid")
|
||||
}
|
||||
|
||||
err := DB(ctx).Model(a).Updates(map[string]interface{}{
|
||||
"action": action,
|
||||
"clock": time.Now().Unix(),
|
||||
}).Error
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
if action == "start" {
|
||||
return StartTask(ctx, a.Id)
|
||||
}
|
||||
|
||||
if action == "cancel" {
|
||||
return CancelTask(ctx, a.Id)
|
||||
}
|
||||
|
||||
if action == "kill" {
|
||||
return KillTask(ctx, a.Id)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// LongTaskIds two weeks ago
|
||||
func LongTaskIds(ctx *ctx.Context) ([]int64, error) {
|
||||
clock := time.Now().Unix() - 604800*2
|
||||
var ids []int64
|
||||
err := DB(ctx).Model(&TaskAction{}).Where("clock < ?", clock).Pluck("id", &ids).Error
|
||||
return ids, err
|
||||
}
|
||||
@@ -1,262 +0,0 @@
|
||||
package models
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"sync"
|
||||
"time"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/ibex/server/config"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
"github.com/ccfos/nightingale/v6/pkg/poster"
|
||||
"github.com/ccfos/nightingale/v6/storage"
|
||||
|
||||
"github.com/toolkits/pkg/logger"
|
||||
"gorm.io/gorm"
|
||||
"gorm.io/gorm/clause"
|
||||
)
|
||||
|
||||
type TaskHost struct {
|
||||
II int64 `gorm:"column:ii;primaryKey;autoIncrement" json:"-"`
|
||||
Id int64 `gorm:"column:id;uniqueIndex:idx_id_host;not null" json:"id"`
|
||||
Host string `gorm:"column:host;uniqueIndex:idx_id_host;size:128;not null" json:"host"`
|
||||
Status string `gorm:"column:status;size:32;not null" json:"status"`
|
||||
Stdout string `gorm:"column:stdout;type:text" json:"stdout"`
|
||||
Stderr string `gorm:"column:stderr;type:text" json:"stderr"`
|
||||
}
|
||||
|
||||
func (taskHost *TaskHost) Upsert(ctx *ctx.Context) error {
|
||||
return DB(ctx).Table(tht(taskHost.Id)).Clauses(clause.OnConflict{
|
||||
Columns: []clause.Column{{Name: "id"}, {Name: "host"}},
|
||||
DoUpdates: clause.AssignmentColumns([]string{"status", "stdout", "stderr"}),
|
||||
}).Create(taskHost).Error
|
||||
}
|
||||
|
||||
func (taskHost *TaskHost) Create(ctx *ctx.Context) error {
|
||||
if config.C.IsCenter {
|
||||
return DB(ctx).Table(tht(taskHost.Id)).Create(taskHost).Error
|
||||
}
|
||||
return poster.PostByUrls(ctx, "/ibex/v1/task/host", taskHost)
|
||||
}
|
||||
|
||||
func TaskHostUpserts(ctx *ctx.Context, lst []TaskHost) (map[string]error, error) {
|
||||
if len(lst) == 0 {
|
||||
return nil, fmt.Errorf("empty list")
|
||||
}
|
||||
|
||||
if !config.C.IsCenter {
|
||||
return poster.PostByUrlsWithResp[map[string]error](ctx, "/ibex/v1/task/hosts/upsert", lst)
|
||||
}
|
||||
|
||||
errs := make(map[string]error, 0)
|
||||
for _, taskHost := range lst {
|
||||
if err := taskHost.Upsert(ctx); err != nil {
|
||||
errs[fmt.Sprintf("%d:%s", taskHost.Id, taskHost.Host)] = err
|
||||
}
|
||||
}
|
||||
return errs, nil
|
||||
}
|
||||
|
||||
func TaskHostGet(ctx *ctx.Context, id int64, host string) (*TaskHost, error) {
|
||||
var ret []*TaskHost
|
||||
err := DB(ctx).Table(tht(id)).Where("id=? and host=?", id, host).Find(&ret).Error
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
if len(ret) == 0 {
|
||||
return nil, nil
|
||||
}
|
||||
|
||||
return ret[0], nil
|
||||
}
|
||||
|
||||
func MarkDoneStatus(ctx *ctx.Context, id, clock int64, host, status, stdout, stderr string, edgeAlertTriggered ...bool) error {
|
||||
if len(edgeAlertTriggered) > 0 && edgeAlertTriggered[0] {
|
||||
return CacheMarkDone(ctx, TaskHost{
|
||||
Id: id,
|
||||
Host: host,
|
||||
Status: status,
|
||||
Stdout: stdout,
|
||||
Stderr: stderr,
|
||||
})
|
||||
}
|
||||
|
||||
if !config.C.IsCenter {
|
||||
return poster.PostByUrls(ctx, "/ibex/v1/mark/done", map[string]interface{}{
|
||||
"id": id,
|
||||
"clock": clock,
|
||||
"host": host,
|
||||
"status": status,
|
||||
"stdout": stdout,
|
||||
"stderr": stderr,
|
||||
})
|
||||
}
|
||||
|
||||
count, err := TableRecordCount(ctx, TaskHostDoing{}.TableName(), "id=? and host=? and clock=?", id, host, clock)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
if count == 0 {
|
||||
// 如果是timeout了,后来任务执行完成之后,结果又上来了,stdout和stderr最好还是存库,让用户看到
|
||||
count, err = TableRecordCount(ctx, tht(id), "id=? and host=? and status=?", id, host, "timeout")
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
if count == 1 {
|
||||
return DB(ctx).Table(tht(id)).Where("id=? and host=?", id, host).Updates(map[string]interface{}{
|
||||
"status": status,
|
||||
"stdout": stdout,
|
||||
"stderr": stderr,
|
||||
}).Error
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
return DB(ctx).Transaction(func(tx *gorm.DB) error {
|
||||
err = tx.Table(tht(id)).Where("id=? and host=?", id, host).Updates(map[string]interface{}{
|
||||
"status": status,
|
||||
"stdout": stdout,
|
||||
"stderr": stderr,
|
||||
}).Error
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
if err = tx.Where("id=? and host=?", id, host).Delete(&TaskHostDoing{}).Error; err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
func RealTimeUpdateOutput(ctx *ctx.Context, id int64, host, stdout, stderr string) error {
|
||||
return DB(ctx).Transaction(func(tx *gorm.DB) error {
|
||||
err := tx.Table(tht(id)).Where("id=? and host=?", id, host).Updates(map[string]interface{}{
|
||||
"stdout": stdout,
|
||||
"stderr": stderr,
|
||||
}).Error
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
func CacheMarkDone(ctx *ctx.Context, taskHost TaskHost) error {
|
||||
if err := ctx.Redis.HDel(ctx.Ctx, IBEX_HOST_DOING, hostDoingCacheKey(taskHost.Id, taskHost.Host)).Err(); err != nil {
|
||||
return err
|
||||
}
|
||||
TaskHostCachePush(taskHost)
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
func WaitingHostList(ctx *ctx.Context, id int64, limit ...int) ([]TaskHost, error) {
|
||||
var hosts []TaskHost
|
||||
session := DB(ctx).Table(tht(id)).Where("id = ? and status = 'waiting'", id).Order("ii")
|
||||
if len(limit) > 0 {
|
||||
session = session.Limit(limit[0])
|
||||
}
|
||||
err := session.Find(&hosts).Error
|
||||
return hosts, err
|
||||
}
|
||||
|
||||
func WaitingHostCount(ctx *ctx.Context, id int64) (int64, error) {
|
||||
return TableRecordCount(ctx, tht(id), "id=? and status='waiting'", id)
|
||||
}
|
||||
|
||||
func UnexpectedHostCount(ctx *ctx.Context, id int64) (int64, error) {
|
||||
return TableRecordCount(ctx, tht(id), "id=? and status in ('failed', 'timeout', 'killfailed')", id)
|
||||
}
|
||||
|
||||
func IngStatusHostCount(ctx *ctx.Context, id int64) (int64, error) {
|
||||
return TableRecordCount(ctx, tht(id), "id=? and status in ('waiting', 'running', 'killing')", id)
|
||||
}
|
||||
|
||||
func RunWaitingHosts(ctx *ctx.Context, taskHosts []TaskHost) error {
|
||||
count := len(taskHosts)
|
||||
if count == 0 {
|
||||
return nil
|
||||
}
|
||||
|
||||
now := time.Now().Unix()
|
||||
|
||||
return DB(ctx).Transaction(func(tx *gorm.DB) error {
|
||||
for i := 0; i < count; i++ {
|
||||
if err := tx.Table(tht(taskHosts[i].Id)).Where("id=? and host=?", taskHosts[i].Id, taskHosts[i].Host).Update("status", "running").Error; err != nil {
|
||||
return err
|
||||
}
|
||||
err := tx.Create(&TaskHostDoing{Id: taskHosts[i].Id, Host: taskHosts[i].Host, Clock: now, Action: "start"}).Error
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
func TaskHostStatus(ctx *ctx.Context, id int64) ([]TaskHost, error) {
|
||||
var ret []TaskHost
|
||||
err := DB(ctx).Table(tht(id)).Select("id", "host", "status").Where("id=?", id).Order("ii").Find(&ret).Error
|
||||
return ret, err
|
||||
}
|
||||
|
||||
func TaskHostGets(ctx *ctx.Context, id int64) ([]TaskHost, error) {
|
||||
var ret []TaskHost
|
||||
err := DB(ctx).Table(tht(id)).Where("id=?", id).Order("ii").Find(&ret).Error
|
||||
return ret, err
|
||||
}
|
||||
|
||||
var (
|
||||
taskHostCache = make([]TaskHost, 0, 128)
|
||||
taskHostLock sync.RWMutex
|
||||
)
|
||||
|
||||
func TaskHostCachePush(taskHost TaskHost) {
|
||||
taskHostLock.Lock()
|
||||
defer taskHostLock.Unlock()
|
||||
|
||||
taskHostCache = append(taskHostCache, taskHost)
|
||||
}
|
||||
|
||||
func TaskHostCachePopAll() []TaskHost {
|
||||
taskHostLock.Lock()
|
||||
defer taskHostLock.Unlock()
|
||||
|
||||
all := taskHostCache
|
||||
taskHostCache = make([]TaskHost, 0, 128)
|
||||
|
||||
return all
|
||||
}
|
||||
|
||||
func ReportCacheResult(ctx *ctx.Context) error {
|
||||
result := TaskHostCachePopAll()
|
||||
reports := make([]TaskHost, 0)
|
||||
for _, th := range result {
|
||||
// id大于redis初始id,说明是edge与center失联时,本地告警规则触发的自愈脚本生成的id
|
||||
// 为了防止不同边缘机房生成的脚本任务id相同,不上报结果至数据库
|
||||
if th.Id >= storage.IDINITIAL {
|
||||
logger.Infof("task[%d] host[%s] done, result:[%v]", th.Id, th.Host, th)
|
||||
} else {
|
||||
reports = append(reports, th)
|
||||
}
|
||||
}
|
||||
|
||||
if len(reports) == 0 {
|
||||
return nil
|
||||
}
|
||||
|
||||
errs, err := TaskHostUpserts(ctx, reports)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
for key, err := range errs {
|
||||
logger.Warningf("report task_host_cache[%s] result error: %v", key, err)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
@@ -1,65 +0,0 @@
|
||||
package models
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"sync"
|
||||
)
|
||||
|
||||
type TaskHostDoing struct {
|
||||
Id int64 `gorm:"column:id;index"`
|
||||
Host string `gorm:"column:host;size:128;not null;index"`
|
||||
Clock int64 `gorm:"column:clock;not null;default:0"`
|
||||
Action string `gorm:"column:action;size:16;not null"`
|
||||
AlertTriggered bool `gorm:"-"`
|
||||
}
|
||||
|
||||
func (TaskHostDoing) TableName() string {
|
||||
return "task_host_doing"
|
||||
}
|
||||
|
||||
func (doing *TaskHostDoing) MarshalBinary() ([]byte, error) {
|
||||
return json.Marshal(doing)
|
||||
}
|
||||
|
||||
func (doing *TaskHostDoing) UnmarshalBinary(data []byte) error {
|
||||
return json.Unmarshal(data, doing)
|
||||
}
|
||||
|
||||
func hostDoingCacheKey(id int64, host string) string {
|
||||
return fmt.Sprintf("%s:%d", host, id)
|
||||
}
|
||||
|
||||
var (
|
||||
doingLock sync.RWMutex
|
||||
doingMaps map[string][]TaskHostDoing
|
||||
)
|
||||
|
||||
func SetDoingCache(v map[string][]TaskHostDoing) {
|
||||
doingLock.Lock()
|
||||
doingMaps = v
|
||||
doingLock.Unlock()
|
||||
}
|
||||
|
||||
func GetDoingCache(host string) []TaskHostDoing {
|
||||
doingLock.RLock()
|
||||
defer doingLock.RUnlock()
|
||||
|
||||
return doingMaps[host]
|
||||
}
|
||||
|
||||
func CheckExistAndEdgeAlertTriggered(host string, id int64) (exist, isAlertTriggered bool) {
|
||||
doingLock.RLock()
|
||||
defer doingLock.RUnlock()
|
||||
|
||||
doings := doingMaps[host]
|
||||
for _, doing := range doings {
|
||||
if doing.Id == id {
|
||||
exist = true
|
||||
isAlertTriggered = doing.AlertTriggered
|
||||
return
|
||||
}
|
||||
}
|
||||
|
||||
return false, false
|
||||
}
|
||||
@@ -1,364 +0,0 @@
|
||||
package models
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/ibex/server/config"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
"github.com/ccfos/nightingale/v6/pkg/poster"
|
||||
"github.com/ccfos/nightingale/v6/storage"
|
||||
|
||||
"github.com/toolkits/pkg/str"
|
||||
"gorm.io/gorm"
|
||||
)
|
||||
|
||||
type TaskMeta struct {
|
||||
Id int64 `gorm:"column:id;primaryKey;autoIncrement" json:"id"`
|
||||
Title string `gorm:"column:title;size:255;not null;default:''" json:"title"`
|
||||
Account string `gorm:"column:account;size:64;not null" json:"account"`
|
||||
Batch int `gorm:"column:batch;not null;default:0" json:"batch"`
|
||||
Tolerance int `gorm:"column:tolerance;not null;default:0" json:"tolerance"`
|
||||
Timeout int `gorm:"column:timeout;not null;default:0" json:"timeout"`
|
||||
Pause string `gorm:"column:pause;size:255;not null;default:''" json:"pause"`
|
||||
Script string `gorm:"column:script;type:text;not null" json:"script"`
|
||||
Args string `gorm:"column:args;size:512;not null;default:''" json:"args"`
|
||||
Stdin string `gorm:"column:stdin;size:1024;not null;default:''" json:"stdin"`
|
||||
Creator string `gorm:"column:creator;size:64;not null;default:'';index" json:"creator"`
|
||||
Created time.Time `gorm:"column:created;not null;default:CURRENT_TIMESTAMP;type:timestamp;index" json:"created"`
|
||||
Done bool `json:"done" gorm:"-"`
|
||||
}
|
||||
|
||||
func (TaskMeta) TableName() string {
|
||||
return "task_meta"
|
||||
}
|
||||
|
||||
func (taskMeta *TaskMeta) MarshalBinary() ([]byte, error) {
|
||||
return json.Marshal(taskMeta)
|
||||
}
|
||||
|
||||
func (taskMeta *TaskMeta) UnmarshalBinary(data []byte) error {
|
||||
return json.Unmarshal(data, taskMeta)
|
||||
}
|
||||
|
||||
func (taskMeta *TaskMeta) Create(ctx *ctx.Context) error {
|
||||
if config.C.IsCenter {
|
||||
return DB(ctx).Create(taskMeta).Error
|
||||
}
|
||||
|
||||
id, err := poster.PostByUrlsWithResp[int64](ctx, "/ibex/v1/task/meta", taskMeta)
|
||||
if err == nil {
|
||||
taskMeta.Id = id
|
||||
}
|
||||
|
||||
return err
|
||||
}
|
||||
|
||||
func taskMetaCacheKey(id int64) string {
|
||||
return fmt.Sprintf("task:meta:%d", id)
|
||||
}
|
||||
|
||||
func TaskMetaGet(ctx *ctx.Context, where string, args ...interface{}) (*TaskMeta, error) {
|
||||
lst, err := TableRecordGets[[]*TaskMeta](ctx, TaskMeta{}.TableName(), where, args...)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
if len(lst) == 0 {
|
||||
return nil, nil
|
||||
}
|
||||
|
||||
return lst[0], nil
|
||||
}
|
||||
|
||||
// TaskMetaGet 根据ID获取任务元信息,会用到缓存
|
||||
func TaskMetaGetByID(ctx *ctx.Context, id int64) (*TaskMeta, error) {
|
||||
meta, err := TaskMetaCacheGet(ctx, id)
|
||||
if err == nil {
|
||||
return meta, nil
|
||||
}
|
||||
|
||||
meta, err = TaskMetaGet(ctx, "id=?", id)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
if meta == nil {
|
||||
return nil, nil
|
||||
}
|
||||
|
||||
_, err = ctx.Redis.Set(context.Background(), taskMetaCacheKey(id), meta, storage.DEFAULT).Result()
|
||||
|
||||
return meta, err
|
||||
}
|
||||
|
||||
func TaskMetaCacheGet(ctx *ctx.Context, id int64) (*TaskMeta, error) {
|
||||
res := ctx.Redis.Get(context.Background(), taskMetaCacheKey(id))
|
||||
meta := new(TaskMeta)
|
||||
err := res.Scan(meta)
|
||||
return meta, err
|
||||
}
|
||||
|
||||
func (m *TaskMeta) CleanFields() error {
|
||||
if m.Batch < 0 {
|
||||
return fmt.Errorf("arg(batch) should be nonnegative")
|
||||
}
|
||||
|
||||
if m.Tolerance < 0 {
|
||||
return fmt.Errorf("arg(tolerance) should be nonnegative")
|
||||
}
|
||||
|
||||
if m.Timeout < 0 {
|
||||
return fmt.Errorf("arg(timeout) should be nonnegative")
|
||||
}
|
||||
|
||||
if m.Timeout > 3600*24*5 {
|
||||
return fmt.Errorf("arg(timeout) longer than five days")
|
||||
}
|
||||
|
||||
if m.Timeout == 0 {
|
||||
m.Timeout = 30
|
||||
}
|
||||
|
||||
m.Pause = strings.Replace(m.Pause, ",", ",", -1)
|
||||
m.Pause = strings.Replace(m.Pause, " ", "", -1)
|
||||
m.Args = strings.Replace(m.Args, ",", ",", -1)
|
||||
|
||||
if m.Title == "" {
|
||||
return fmt.Errorf("arg(title) is required")
|
||||
}
|
||||
|
||||
if str.Dangerous(m.Title) {
|
||||
return fmt.Errorf("arg(title) is dangerous")
|
||||
}
|
||||
|
||||
if m.Script == "" {
|
||||
return fmt.Errorf("arg(script) is required")
|
||||
}
|
||||
|
||||
if str.Dangerous(m.Args) {
|
||||
return fmt.Errorf("arg(args) is dangerous")
|
||||
}
|
||||
|
||||
if str.Dangerous(m.Pause) {
|
||||
return fmt.Errorf("arg(pause) is dangerous")
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
func (m *TaskMeta) HandleFH(fh string) {
|
||||
i := strings.Index(m.Title, " FH: ")
|
||||
if i > 0 {
|
||||
m.Title = m.Title[:i]
|
||||
}
|
||||
m.Title = m.Title + " FH: " + fh
|
||||
}
|
||||
|
||||
func (taskMeta *TaskMeta) Cache(ctx *ctx.Context, host string) error {
|
||||
|
||||
tx := ctx.Redis.TxPipeline()
|
||||
tx.Set(ctx.Ctx, taskMetaCacheKey(taskMeta.Id), taskMeta, storage.DEFAULT)
|
||||
tx.HSet(ctx.Ctx, IBEX_HOST_DOING, hostDoingCacheKey(taskMeta.Id, host), &TaskHostDoing{
|
||||
Id: taskMeta.Id,
|
||||
Host: host,
|
||||
Clock: time.Now().Unix(),
|
||||
Action: "start",
|
||||
})
|
||||
|
||||
_, err := tx.Exec(ctx.Ctx)
|
||||
|
||||
return err
|
||||
}
|
||||
|
||||
func (taskMeta *TaskMeta) Save(ctx *ctx.Context, hosts []string, action string) error {
|
||||
return DB(ctx).Transaction(func(tx *gorm.DB) error {
|
||||
if err := tx.Create(taskMeta).Error; err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
id := taskMeta.Id
|
||||
|
||||
if err := tx.Create(&TaskScheduler{Id: id}).Error; err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
if err := tx.Create(&TaskAction{Id: id, Action: action, Clock: time.Now().Unix()}).Error; err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
for i := 0; i < len(hosts); i++ {
|
||||
host := strings.TrimSpace(hosts[i])
|
||||
if host == "" {
|
||||
continue
|
||||
}
|
||||
|
||||
err := tx.Exec("INSERT INTO "+tht(id)+" (id, host, status) VALUES (?, ?, ?)", id, host, "waiting").Error
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
func (m *TaskMeta) Action(ctx *ctx.Context) (*TaskAction, error) {
|
||||
return TaskActionGet(ctx, "id=?", m.Id)
|
||||
}
|
||||
|
||||
func (m *TaskMeta) Hosts(ctx *ctx.Context) ([]TaskHost, error) {
|
||||
var ret []TaskHost
|
||||
err := DB(ctx).Table(tht(m.Id)).Where("id=?", m.Id).Select("id", "host", "status").Order("ii").Find(&ret).Error
|
||||
return ret, err
|
||||
}
|
||||
|
||||
func (m *TaskMeta) KillHost(ctx *ctx.Context, host string) error {
|
||||
bean, err := TaskHostGet(ctx, m.Id, host)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
if bean == nil {
|
||||
return fmt.Errorf("no such host")
|
||||
}
|
||||
|
||||
if !(bean.Status == "running" || bean.Status == "timeout") {
|
||||
return fmt.Errorf("current status cannot kill")
|
||||
}
|
||||
|
||||
if err := redoHost(ctx, m.Id, host, "kill"); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
return statusSet(ctx, m.Id, host, "killing")
|
||||
}
|
||||
|
||||
func (m *TaskMeta) IgnoreHost(ctx *ctx.Context, host string) error {
|
||||
return statusSet(ctx, m.Id, host, "ignored")
|
||||
}
|
||||
|
||||
func (m *TaskMeta) RedoHost(ctx *ctx.Context, host string) error {
|
||||
bean, err := TaskHostGet(ctx, m.Id, host)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
if bean == nil {
|
||||
return fmt.Errorf("no such host")
|
||||
}
|
||||
|
||||
if err := redoHost(ctx, m.Id, host, "start"); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
return statusSet(ctx, m.Id, host, "running")
|
||||
}
|
||||
|
||||
func statusSet(ctx *ctx.Context, id int64, host, status string) error {
|
||||
return DB(ctx).Table(tht(id)).Where("id=? and host=?", id, host).Update("status", status).Error
|
||||
}
|
||||
|
||||
func redoHost(ctx *ctx.Context, id int64, host, action string) error {
|
||||
count, err := IbexCount(DB(ctx).Model(&TaskHostDoing{}).Where("id=? and host=?", id, host))
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
now := time.Now().Unix()
|
||||
if count == 0 {
|
||||
err = DB(ctx).Table("task_host_doing").Create(map[string]interface{}{
|
||||
"id": id,
|
||||
"host": host,
|
||||
"clock": now,
|
||||
"action": action,
|
||||
}).Error
|
||||
} else {
|
||||
err = DB(ctx).Table("task_host_doing").Where("id=? and host=? and action <> ?", id, host, action).Updates(map[string]interface{}{
|
||||
"clock": now,
|
||||
"action": action,
|
||||
}).Error
|
||||
}
|
||||
return err
|
||||
}
|
||||
|
||||
func (m *TaskMeta) HostStrs(ctx *ctx.Context) ([]string, error) {
|
||||
var ret []string
|
||||
err := DB(ctx).Table(tht(m.Id)).Where("id=?", m.Id).Order("ii").Pluck("host", &ret).Error
|
||||
return ret, err
|
||||
}
|
||||
|
||||
func (m *TaskMeta) Stdouts(ctx *ctx.Context) ([]TaskHost, error) {
|
||||
var ret []TaskHost
|
||||
err := DB(ctx).Table(tht(m.Id)).Where("id=?", m.Id).Select("id", "host", "status", "stdout").Order("ii").Find(&ret).Error
|
||||
return ret, err
|
||||
}
|
||||
|
||||
func (m *TaskMeta) Stderrs(ctx *ctx.Context) ([]TaskHost, error) {
|
||||
var ret []TaskHost
|
||||
err := DB(ctx).Table(tht(m.Id)).Where("id=?", m.Id).Select("id", "host", "status", "stderr").Order("ii").Find(&ret).Error
|
||||
return ret, err
|
||||
}
|
||||
|
||||
func TaskMetaTotal(ctx *ctx.Context, creator, query string, before time.Time) (int64, error) {
|
||||
session := DB(ctx).Model(&TaskMeta{})
|
||||
|
||||
session = session.Where("created > '" + before.Format("2006-01-02 15:04:05") + "'")
|
||||
|
||||
if creator != "" {
|
||||
session = session.Where("creator = ?", creator)
|
||||
}
|
||||
|
||||
if query != "" {
|
||||
// q1 q2 -q3
|
||||
arr := strings.Fields(query)
|
||||
for i := 0; i < len(arr); i++ {
|
||||
if arr[i] == "" {
|
||||
continue
|
||||
}
|
||||
if strings.HasPrefix(arr[i], "-") {
|
||||
q := "%" + arr[i][1:] + "%"
|
||||
session = session.Where("title not like ?", q)
|
||||
} else {
|
||||
q := "%" + arr[i] + "%"
|
||||
session = session.Where("title like ?", q)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return IbexCount(session)
|
||||
}
|
||||
|
||||
func TaskMetaGets(ctx *ctx.Context, creator, query string, before time.Time, limit, offset int) ([]TaskMeta, error) {
|
||||
session := DB(ctx).Model(&TaskMeta{}).Order("created desc").Limit(limit).Offset(offset)
|
||||
|
||||
session = session.Where("created > '" + before.Format("2006-01-02 15:04:05") + "'")
|
||||
|
||||
if creator != "" {
|
||||
session = session.Where("creator = ?", creator)
|
||||
}
|
||||
|
||||
if query != "" {
|
||||
// q1 q2 -q3
|
||||
arr := strings.Fields(query)
|
||||
for i := 0; i < len(arr); i++ {
|
||||
if arr[i] == "" {
|
||||
continue
|
||||
}
|
||||
if strings.HasPrefix(arr[i], "-") {
|
||||
q := "%" + arr[i][1:] + "%"
|
||||
session = session.Where("title not like ?", q)
|
||||
} else {
|
||||
q := "%" + arr[i] + "%"
|
||||
session = session.Where("title like ?", q)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
var objs []TaskMeta
|
||||
err := session.Find(&objs).Error
|
||||
return objs, err
|
||||
}
|
||||
@@ -1,47 +0,0 @@
|
||||
package models
|
||||
|
||||
import (
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
|
||||
"gorm.io/gorm"
|
||||
)
|
||||
|
||||
type TaskScheduler struct {
|
||||
Id int64 `gorm:"column:id;primaryKey"`
|
||||
Scheduler string `gorm:"column:scheduler;size:128;not null;default:''"`
|
||||
}
|
||||
|
||||
func (TaskScheduler) TableName() string {
|
||||
return "task_scheduler"
|
||||
}
|
||||
|
||||
func TasksOfScheduler(ctx *ctx.Context, scheduler string) ([]int64, error) {
|
||||
var ids []int64
|
||||
err := DB(ctx).Model(&TaskScheduler{}).Where("scheduler = ?", scheduler).Pluck("id", &ids).Error
|
||||
return ids, err
|
||||
}
|
||||
|
||||
func TakeOverTask(ctx *ctx.Context, id int64, pre, current string) (bool, error) {
|
||||
ret := DB(ctx).Model(&TaskScheduler{}).Where("id = ? and scheduler = ?", id, pre).Update("scheduler", current)
|
||||
if ret.Error != nil {
|
||||
return false, ret.Error
|
||||
}
|
||||
|
||||
return ret.RowsAffected > 0, nil
|
||||
}
|
||||
|
||||
func OrphanTaskIds(ctx *ctx.Context) ([]int64, error) {
|
||||
var ids []int64
|
||||
err := DB(ctx).Model(&TaskScheduler{}).Where("scheduler = ''").Pluck("id", &ids).Error
|
||||
return ids, err
|
||||
}
|
||||
|
||||
func CleanDoneTask(ctx *ctx.Context, id int64) error {
|
||||
return DB(ctx).Transaction(func(tx *gorm.DB) error {
|
||||
if err := tx.Where("id = ?", id).Delete(&TaskScheduler{}).Error; err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
return tx.Where("id = ?", id).Delete(&TaskAction{}).Error
|
||||
})
|
||||
}
|
||||
@@ -1,47 +0,0 @@
|
||||
package models
|
||||
|
||||
import (
|
||||
"github.com/ccfos/nightingale/v6/pkg/ctx"
|
||||
|
||||
"time"
|
||||
)
|
||||
|
||||
type TaskSchedulerHealth struct {
|
||||
Scheduler string `gorm:"column:scheduler;uniqueIndex;size:128;not null"`
|
||||
Clock int64 `gorm:"column:clock;not null;index"`
|
||||
}
|
||||
|
||||
func (TaskSchedulerHealth) TableName() string {
|
||||
return "task_scheduler_health"
|
||||
}
|
||||
|
||||
func TaskSchedulerHeartbeat(ctx *ctx.Context, scheduler string) error {
|
||||
var cnt int64
|
||||
err := DB(ctx).Model(&TaskSchedulerHealth{}).Where("scheduler = ?", scheduler).Count(&cnt).Error
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
if cnt == 0 {
|
||||
ret := DB(ctx).Create(&TaskSchedulerHealth{
|
||||
Scheduler: scheduler,
|
||||
Clock: time.Now().Unix(),
|
||||
})
|
||||
err = ret.Error
|
||||
} else {
|
||||
err = DB(ctx).Model(&TaskSchedulerHealth{}).Where("scheduler = ?", scheduler).Update("clock", time.Now().Unix()).Error
|
||||
}
|
||||
|
||||
return err
|
||||
}
|
||||
|
||||
func DeadTaskSchedulers(ctx *ctx.Context) ([]string, error) {
|
||||
clock := time.Now().Unix() - 10
|
||||
var arr []string
|
||||
err := DB(ctx).Model(&TaskSchedulerHealth{}).Where("clock < ?", clock).Pluck("scheduler", &arr).Error
|
||||
return arr, err
|
||||
}
|
||||
|
||||
func DelDeadTaskScheduler(ctx *ctx.Context, scheduler string) error {
|
||||
return DB(ctx).Where("scheduler = ?", scheduler).Delete(&TaskSchedulerHealth{}).Error
|
||||
}
|
||||
@@ -2,9 +2,11 @@ package migrate
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
|
||||
"github.com/ccfos/nightingale/v6/models"
|
||||
"github.com/ccfos/nightingale/v6/pkg/ormx"
|
||||
|
||||
imodels "github.com/flashcatcloud/ibex/src/models"
|
||||
"github.com/toolkits/pkg/logger"
|
||||
"gorm.io/driver/mysql"
|
||||
"gorm.io/gorm"
|
||||
@@ -26,7 +28,7 @@ func MigrateIbexTables(db *gorm.DB) {
|
||||
db = db.Set("gorm:table_options", tableOptions)
|
||||
}
|
||||
|
||||
dts := []interface{}{&models.TaskMeta{}, &models.TaskScheduler{}, &models.TaskSchedulerHealth{}, &models.TaskHostDoing{}, &models.TaskAction{}}
|
||||
dts := []interface{}{&imodels.TaskMeta{}, &imodels.TaskScheduler{}, &imodels.TaskSchedulerHealth{}, &imodels.TaskHostDoing{}, &imodels.TaskAction{}}
|
||||
for _, dt := range dts {
|
||||
err := db.AutoMigrate(dt)
|
||||
if err != nil {
|
||||
@@ -36,7 +38,7 @@ func MigrateIbexTables(db *gorm.DB) {
|
||||
|
||||
for i := 0; i < 100; i++ {
|
||||
tableName := fmt.Sprintf("task_host_%d", i)
|
||||
err := db.Table(tableName).AutoMigrate(&models.TaskHost{})
|
||||
err := db.Table(tableName).AutoMigrate(&imodels.TaskHost{})
|
||||
if err != nil {
|
||||
logger.Errorf("failed to migrate table:%s %v", tableName, err)
|
||||
}
|
||||
@@ -56,7 +58,7 @@ func MigrateTables(db *gorm.DB) error {
|
||||
dts := []interface{}{&RecordingRule{}, &AlertRule{}, &AlertSubscribe{}, &AlertMute{},
|
||||
&TaskRecord{}, &ChartShare{}, &Target{}, &Configs{}, &Datasource{}, &NotifyTpl{},
|
||||
&Board{}, &BoardBusigroup{}, &Users{}, &SsoConfig{}, &models.BuiltinMetric{},
|
||||
&models.MetricFilter{}, &models.BuiltinComponent{}, &models.NotificaitonRecord{}}
|
||||
&models.MetricFilter{}, &models.BuiltinComponent{}}
|
||||
|
||||
if !columnHasIndex(db, &AlertHisEvent{}, "original_tags") ||
|
||||
!columnHasIndex(db, &AlertCurEvent{}, "original_tags") {
|
||||
@@ -225,15 +227,13 @@ type AlertCurEvent struct {
|
||||
}
|
||||
|
||||
type Target struct {
|
||||
HostIp string `gorm:"column:host_ip;type:varchar(15);default:'';comment:IPv4 string;index:idx_host_ip"`
|
||||
AgentVersion string `gorm:"column:agent_version;type:varchar(255);default:'';comment:agent version;index:idx_agent_version"`
|
||||
EngineName string `gorm:"column:engine_name;type:varchar(255);default:'';comment:engine name;index:idx_engine_name"`
|
||||
OS string `gorm:"column:os;type:varchar(31);default:'';comment:os type;index:idx_os"`
|
||||
HostTags []string `gorm:"column:host_tags;type:text;comment:global labels set in conf file;serializer:json"`
|
||||
HostIp string `gorm:"column:host_ip;varchar(15);default:'';comment:IPv4 string;index:idx_host_ip"`
|
||||
AgentVersion string `gorm:"column:agent_version;varchar(255);default:'';comment:agent version;index:idx_agent_version"`
|
||||
EngineName string `gorm:"column:engine_name;varchar(255);default:'';comment:engine name;index:idx_engine_name"`
|
||||
}
|
||||
|
||||
type Datasource struct {
|
||||
IsDefault bool `gorm:"column:is_default;type:boolean;comment:is default datasource"`
|
||||
IsDefault bool `gorm:"column:is_default;type:boolean;not null;comment:is default datasource"`
|
||||
}
|
||||
|
||||
type Configs struct {
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user