The recently-added crush weight comparison in reweight_osds() that
checks weights for zero isn't working correctly because the
expected weight is being calculated to two decimal places and then
compared against "0" as a string. This updates the comparison
string to "0.00" to match the calculation.
Change-Id: I29387a597a21180bb7fba974b4daeadf6ffc182d
The PS updates post apply job and moves execution of the command outside
of if statement. The output of the command stored in a variable
which will be checked in if statement. Added "-z" to correct comparison
of the length of the string (variable). It was accidentally missed in
the initial PS.
Change-Id: I907f75d0a9e5ef27fba5306ddb86199e94b01b3b
1) A separate prometheus job need to provide target and scrap metrics
2) it is based on https://github.com/prometheus/blackbox_exporter
Adding script file for deployment and job under zuul.d
Resolving conflict
Change-Id: Ia15ab7d8ef882886fe0e37cc2599e6815d7bcc6c
If circumstances are such that the reweight function believes
OSD disks have zero size, refrain from reweighting OSDs to 0.
This can happen if OSDs are deployed with the noup flag set.
Also move the setting and unsetting of flags above this
calculation as an additional precautionary measure.
Change-Id: Ibc23494e0e75cfdd7654f5c0d3b6048b146280f7
This implements security context override at pod level and adds
readOnly-fs to keystone-webhook container
Change-Id: Ia67947b7323e41363a5ee379c0dfb001936b5107
This change allows us to substitute values into our rules files.
Example:
- alert: my_region_is_down
expr: up{region="{{ $my_region }}"} == 0
To support this change, rule annotations that used the expansion
{{ $labels.foo }} had to be surrounded with "{{` ... `}}" to render
correctly.
Change-Id: Ia7ac891de8261acca62105a3e2636bd747a5fbea
The PS updates wait_for_pods function and adds query to filter the pods
which are not in Running or Succeeded state.
Also the PS reduces the amount of 'kubectl get' requests.
Change-Id: Ie2abdaf0a87ca377f5ce287a3de9e87d1ca6c0d4
Pass parameter from job allowing to parallelize helm tests using
separate scripts.
Change-Id: I3e06c5590d51c75448dc5ff5978dc7fc90daca6f
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
With this commit minikube is installed using contents of precreated
minikube-aio image containing installation script, all required binaries
and images inside. Pulling a single image from dockerhub via opendev
dockerhub proxy and loading images allows to save up to 6 minutes in
minikube installation.
Change-Id: I5936f440eb0567b8dcba2fdae614e4c5e88a7b9a
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
This chart could deploy fluentd either as a Deployment
or a Daemonset. Both options would use the deployment-fluentd
template with various sections toggled off based on values.yaml
I'd like to know - Does anyone run this chart as a Deployment?
We can simplify the chart, and zuul gates, by changing the chart
to deploy a Daemonset specifically.
Change-Id: Ie88ceadbf5113fc60e5bb0ddef09e18fe07a192c
This change is to address a memory leak in the ceph-mgr deployment.
The leak has also been noted in:
https://review.opendev.org/#/c/711085
Without this change memory usage for the active ceph-mgr pod will
steadily increase by roughly 100MiB per hour until all available
memory has been exhausted. Reset messages will also be seen in the
active and standby ceph-mgr pod logs.
Sample messages:
---
0 client.0 ms_handle_reset on v2:10.0.0.226:6808/1
0 client.0 ms_handle_reset on v2:10.0.0.226:6808/1
0 client.0 ms_handle_reset on v2:10.0.0.226:6808/1
---
The root cause of the resets and associated memory leak appears to
be due to multiple ceph pods sharing the same IP address (due to
hostNetwork being true) and PID (due to hostPID being false).
In the messages above the "1" at the end of the line is the PID.
Ceph appears to use the Version:IP:Port/PID (v2:10.0.0.226:6808/1)
tuple as a unique identifier. When hostPID is false conflicts arise.
Setting hostPID to true stops the reset messages and memory leak.
Change-Id: I9821637e75e8f89b59cf39842a6eb7e66518fa2c
1) Added to service account name insted of traditional pod name
to resolve for dynamic release names.
Change-Id: Ibf4c69415e69a7baca2e3b96bcb23851e68d07d8
The PS updates wait_for_inactive_pgs function:
- Changed the name of the function to wait_for_pgs
- Added a query for getting status of pgs
- All pgs should be in "active+" state at least three times in a row
Change-Id: Iecc79ebbdfaa74886bca989b23f7741a1c3dca16
The PS adds the check of target osd value. The expected amount of OSDs
should be always more or equal to existing OSDs. If there is more OSDs
than expected it means that the value is not correct.
Change-Id: I117a189a18dbb740585b343db9ac9b596a34b929