36 Commits

Author SHA1 Message Date
Gage Hugo
19317b5c6c Update define-nagios-hosts.py error handling
Currently if list_nodes fails in the define-nagios-hosts.py
script, the entire script will fail with some unknown
error. This change updates the script to better catch
and report any exceptions that happen.

Change-Id: I0e33f47af8ad8f69f2f1e4a5b377d0e31d0c0819
2022-04-03 21:24:37 +00:00
Chris Straut (cs4987)
a21cb2a0af Fix issue with retry logic querying prometheus
The issue is that the successful response from prometheus wasn't
triggering of the exit from the retry loop. Now on successful queries
the while retry loop will break into a successful exit strategy.

Change-Id: I528c1c17d2131256097cac5a67ec7ea17541c685
2021-08-26 11:00:29 -05:00
Chris Straut (cs4987)
5522c1856e Add retry around Nagios Request
Added a retry around the Nagios request commands. Updated the code
based on comments and feedback.

Change-Id: I24588c112e2b5ec954f857550bda7d78bdf6d03e
2021-07-30 14:01:51 -05:00
Gupta, Sangeet (sg774j)
f6e73efa87 Nagios: Add support to communicate with servers with TLS
Added support to be able to talk to TLS enabled prometheus
and elasticsearch by passing the CA cert to the request object.

Change-Id: I0616b3e5d251cc6c9cd3cc28bc44977ff5164b3c
2021-06-29 13:12:20 +00:00
Smith, David (ds3330)
a262c243d2 Fix logic for critical_threshold to match the documentation
Change-Id: I1ac031106d487d61688207c714911fd91d1a717f
2021-02-04 22:53:18 +00:00
radhika pai
2b3e8a8df6 Nagios: ES plugin update
Updated the default timeout from 30sec to 120sec as the bigger
query was taking longer than 30 sec, resulting in UNKNOWN alerts.
Updated the service checks from "unknown" to "warning" since
the unknown alerts were causing issue with ticketing.

Change-Id: I65919207be8b5422ffb13f3d1ccfff0323f23168
2020-10-28 15:39:53 +00:00
Meg Heisler
bb3086e504 Hide the password in nagios error message
The nagios api password was being revealed in several
error messages, this uses regex to remove the password
and replace it with a place holder to avoid exposing it.

Change-Id: I8771cbc3127edba47dff8db5de0990659d4d2b49
2020-09-10 14:27:56 -05:00
John Lawrence
f15ec37233 To make the script in to nagios plugin
To comply with the nagios plugin format

Change-Id: I08b9af0a986cbd0e841dea721ec71f0980f956f0
2020-07-13 19:38:05 +00:00
Singh, Jasvinder (js581j)
543ed9c941 Correcting the rbd monitoring script
Updating the script, so that it can work with multiple storage classes.
Also corrected the script on certain failure points.

Change-Id: Ic3d7c6b4877fc5ce4e1ce3b58b05bb7b138b0c80
2020-06-23 17:38:41 -04:00
IPATOV, DENIS (di0361)
b0746f732b Change kubectl binary to kubernetes library
Change-Id: I9f6ed6f286a50a70380625411b8447709bc64d42
2020-06-09 13:56:30 +00:00
Gage Hugo
e9b2ff0c74 Remove OSH Authors copyright
The current copyright refers to a non-existent group
"openstack helm authors" with often out-of-date references that
are confusing when adding a new file to the repo.

This change removes all references to this copyright by the
non-existent group and any blank lines underneath.

Change-Id: Ic78d29883364378cc14b11402f16d99dcec1fc96
2020-05-07 02:11:23 +00:00
MirgDenis
548c599498 Align Nagios plugins with Python3
This commit aligns Nagios plugins with Python3.
Since dict.iteritems() was removed in Python3 and substituted
with ditc.items() we have to change them in plugins.

Change-Id: I782f90a91e8dadd959c4d8537a80c44180c0b78d
2020-04-30 18:00:43 +03:00
Radhika Pai
f3808a2622 Nagios: The plugin script is updated to hide password in url
The code is updated such that the password is obscured in the url at the output.
Ex: http://username:password@example.com to
http://username:???@example.com

Change-Id: I775ad08e929e34f06ef8a1ac44382006f5ae3ad5
2020-04-20 21:03:44 +00:00
John Lawrence
84f1e2a8bd Nagios python and ubuntu version upgrades
Ubuntu bionic and python 2.x to 3.x upgrades

Change-Id: I06399204d30c3607ab3bcb7cfd464a216571edea
2020-03-25 14:34:06 +00:00
dmyrhorodskyi
c78298328e Change condition in define-host plugin
This commit changes confdition in define-host plugin in order to
prevent failing with "KeyError: 'NODE_DOMAIN'" in case we do not
have such environment variable.

Change-Id: I030e3f01ca9d25f3946fd621635f422d3278f21e
2020-03-13 15:54:12 +02:00
Zuul
5eb530430b Merge "Replace node_name with host_name in nagios plugin" 2020-03-12 20:48:48 +00:00
Meg Heisler
7502ee6e47 Replace node_name with host_name in nagios plugin
This fixes a typo in the nagios define-host plugin that
would append a domain name to the hostname

Change-Id: I62c3eca27d3ced28d2abe18d75bb6e71889c8ee3
2020-03-12 15:25:35 -05:00
John Lawrence
f3ae8605be Nagios prometheus plugin connections errors
Sometimes nagios to prometheus connection is
taking longer than 20 seconds and extending
to 40 seconds.

Change-Id: I20b8d47cbc8eeb08d93bf902f922f8bbf8769839
2020-03-12 18:44:18 +00:00
Zuul
52e9745293 Merge "Append the domain name to the host name in Nagios dashboard" 2020-03-10 16:31:58 +00:00
Meg Heisler
cec7ff6963 Append the domain name to the host name in Nagios dashboard
This adds logic to check if an environment variable has been
set for the domain name and appends it to the host name
if it has so that the full FQDN appears in Nagios dashboard.
If the the domain name has not been set just the host name is
given like previous behavior.

Change-Id: Id42edb073d4701ddb61f4957af7e5ac5f931dfbf
2020-03-10 10:05:55 -05:00
John Lawrence
ded4c59dc2 Enhance the ES plugin to check the results count field.
The plugin script uses [hits][hits] for checking the total
result count. So, it was updated to use the right field.

Change-Id: I371302bc24e59320a59bd815922d41e387e23e3a
2020-03-04 16:57:17 +00:00
John Lawrence
f80358d3e9 Nagios elasticsearch plugin fix for the new version of the easticsearch
The unncessary conditional check is throwing exceptions.

Change-Id: Iefa4a95362168f3fb947dbe0049e567236c7830c
2020-01-29 16:15:26 +00:00
Pai, Radhika (rp592h)
3f338cac5b Nagios plugin: snmp alarm notfication enhancement
Adding the script with status and the message.
This will help the calling plugins to check the status.

Change-Id: I17f7db72240dd53513064f5180f5914c9c638ed5
2019-12-31 18:35:39 +00:00
Steve Wilkerson
0635f74e9e Nagios; Update check_exporter_health_metric plugin
This updates the Nagios plugin for checking health metrics exposed
by an exporter endpoint directly. To address issues where multiple
exporter replicas may not all be active, this moves the plugin to
instead use the python kubernetes client to programmatically
determine which endpoint tied to the exporter service is active
and returns the metrics exposed by the active endpoint. This
allows for more robust checking of service health in scenarios
where circumventing prometheus is desired

Change-Id: I14e21936d1808a4f41b20368451da95100075dda
Signed-off-by: Steve Wilkerson <sw5822@att.com>
2019-11-13 03:37:19 +00:00
John Lawrence
01210c2a09 Nagios http POST event handler timeout
The current timeout is very low and is not efficient.
A default value of 20s timeout is added.

Change-Id: I9a912776d13585a1fbd75c5fc4bc7a297b8e225e
2019-11-05 22:40:48 +00:00
Zuul
6f540115cd Merge "Include default timeout values to the nagios plugins" 2019-10-17 02:44:50 +00:00
John Lawrence
8325c8d33f Include default timeout values to the nagios plugins
This will prevent the service checks queuing up

Change-Id: I8cd69ed393c107f5126b4d9d4ce931ecce86cd8b
2019-10-16 21:50:42 +00:00
IPATOV, DENIS (di0361)
3ae2b8b445 Monitoring PVC-PV-RBD and mapping them
The script allows to monitor if PVC has associated PV and RBD associated with PV.

usage: pvc_rbd_monitoring.py [-h] [--rbd] [--pvc] [--all] [--pv] [--bin]
                                  [--config] [--debug | --silent]

optional arguments:
  -h, --help    show this help message and exit
  --debug, -d   enable debugging (default: False)
  --silent, -s  don't log into console (default: False)

monitoring settigns:
  --rbd, -r     check rbd (default: False)
  --pvc, -p     check permanent volume claim (default: False)
  --all, -a     all checks (default: False)
  --pv          check permanent volumes (default: False)
  --bin, -e     path to kubecl binary (default: None)
  --config, -c  path to kubecl config (default: None)

Change-Id: Ic5e594f37bb292b4e0ce5856ec54af53e3a5fab0
2019-10-14 13:55:08 +00:00
Zuul
df6e2e83de Merge "To enhance the nagios elasticsearch plugin" 2019-08-27 20:25:00 +00:00
John Lawrence
7817c11c31 To enhance the nagios elasticsearch plugin
Fixes include types validation,index fields changes, counts
and adding meaningful details to the critical messages etc

Change-Id: Ib8d8a87be4e0526378aa04ccd8ff5631805adfeb
2019-08-26 20:19:15 +00:00
Pai, Radhika (rp592h)
6428bb3237 Nagios: Updated the plugin to handle warnings
The plugin only trigerred Critical for any type of Severity fired from the
Prometheus alerts.
Now the code is updated to handle the Prometheus alert of severity=warning
along with severity=page.
This should help in alarm tuning in Prometheus and Nagios.

Change-Id: I89c1880ab05b896590391db611354b069ade363a
2019-08-20 15:28:52 +00:00
Pai, Radhika (rp592h)
28ccb72c49 Nagios: Updated plugin script to handle Null value
This script was earlier giving output as OK for null values in the
metrics dictionary variable.
Edited the script to handle a null "metrics" dictionary.
Updated the indentation

Change-Id: I760d6ac4fc5341361d064a8a15f6e44287d48f40
2019-07-19 15:58:19 +00:00
Steve Wilkerson
381737ee61 Nagios: Update plugin for defining hosts and host groups
This renames the check_update_prometheus_hosts plugin to be more
representative of what the current functionality does, which is
to simply define nagios hosts. This also updates the behavior of
the plugin to no longer force a reload of nagios via a hangup
signal when attempting to update the hosts file. The result is a
significant reduction in the logs output by the Nagios service,
which will better enable tracking history of service checks and
hosts.

Instead of this plugin being run as a recurring check, it can now
be run as an init container for the Nagios pod so Nagios has a
comprehensive list of its hosts and host groups before starting
the service

Change-Id: Ife2cdf2112db3798dbde73bafe436ef3c0c8a870
Signed-off-by: Steve Wilkerson <sw5822@att.com>
2019-06-28 11:03:02 -05:00
Steve Wilkerson
eae35b7739 Nagios: Update Nagios host definition plugin
This updates the plugin responsible for defining Nagios's hosts
and hostgroups to use the Kubernetes python client instead of
querying Prometheus for this information. This results in a more
predictable and reliable list of hosts for Nagios to use, as
querying Prometheus for scalar metrics in a point-in-time could
result in a host not being added correctly in scenarios where
a host is down when Nagios is attempting to query Prometheus to
generate the list of hosts

Change-Id: I962696eac7c9cc94650666a1d3a60c610d1ae867
Signed-off-by: Steve Wilkerson <sw5822@att.com>
2019-06-26 17:58:29 -05:00
Steve Wilkerson
e1a1faec28 Nagios: Make scripts and plugins executable
This updates the entrypoint script and the plugins included with
the nagios image executable

Change-Id: Iaeb2fad62ac213b74637dadc329e7ea304602ab8
Signed-off-by: Steve Wilkerson <sw5822@att.com>
2019-06-26 17:35:55 -05:00
Steve Wilkerson
dbf42a2301 Add Nagios image to openstack-helm-images
This adds the Prometheus-aware Nagios core 4 image built for
openstack-helm-infra to the openstack-helm-images repository

Change-Id: Icd7bcdee59f1dc719d0dc5e7517294ac922f680e
Signed-off-by: Steve Wilkerson <sw5822@att.com>
2019-06-25 02:47:43 +00:00