78 Commits

Author SHA1 Message Date
Gage Hugo
79d75267ea Move osh-infra jobs to use helm3
This change updates many of the deployment scripts to properly
handle deploying each service via helm 3 and updates each job
to use the helm v3 install script.

Change-Id: I90a7b59231376b9179439c2554e46449d59b9c15
2022-03-24 13:05:42 -05:00
jayonlau
60a9540e0b Remove helm status from deployment scripts for multinode
With the move to helm v3, helm status requires a namespace to be specified, but doing so breaks helm v2 compatability. This change removes the usage of helm serve in openstack-helm-infra's deployment scripts.

Change-Id: Ia600b979bf48629962577b3c5674bfa7415d78c0
2021-10-13 12:25:08 -04:00
Parsons, Cliff (cp769u)
d9404f89c2 Enable Ceph CSI Provisioner to Stand Alone
The current implementation of the Ceph CSI provisioner is tied too
closely with the older Ceph RBD provisioner, which doesn't let the
deployer deploy Ceph CSI provisioner without the old RBD provisioner.

This patchset will decouple them such that they can be deployed
independently from one another.

A few other changes are needed as well:
1) The deployment/gate scripts are updated so that the old RBD and
   CSI RBD provisioners are separately enabled/disabled as needed.
   The original RBD provisioner is now deprecated.
2) Ceph-mon chart is updated because it had some RBD storageclass
   data in values.yaml that is not needed for ceph-mon deployment.
3) Fixed a couple of bugs in job-cephfs-client-key.yaml where RBD
   parameters were being used instead of cephfs parameters.

Change-Id: Icb5f78dcefa51990baf1b6d92411eb641c2ea9e2
2021-06-15 14:48:09 +00:00
jinyuanliu
5c6d281b62 Remove zookeeper residue
About zookeeper chart,It's been removed,But there are still some related scripts that have not been completely deleted,we should remove them.

Change-Id: Iae20717482ad6c7a40f54174eef120d094abbd59
2021-03-01 14:31:03 +08:00
jinyuan
5549eb0d31 Remove Alerta residue
About Alerta chart,It's been removed,But there are two related scripts,we should remove them.

Change-Id: I859a8713422f6d4c5df79d2b01f54c89dcdfa0b4
2021-02-20 09:40:09 +00:00
Zuul
754d8e93b4 Merge "Add Alerta feature to osh-infra" 2020-08-19 21:19:18 +00:00
Xiaoguang(William) Zhang
83a55fd19e Add Alerta feature to osh-infra
Change-Id: Id8dc3f86b8d6754df4ba3c0c720a78731e3f54d5
2020-08-19 13:35:40 +00:00
Zuul
622bc4d972 Merge "Remove remaining test pods before new test run" 2020-08-18 16:21:00 +00:00
Gayathri Devi Kathiri
a57190fd8a Remove remaining test pods before new test run
If the test pod still exists, then the new test run
fails with ERROR: pods "abc-test" already exists

So, Removing remaining test pods before new test run

Change-Id: I3b3ed5ceaf420aa39a669b4a50a838ad154b1fdd
Closes-Bug: #1882030
2020-08-13 18:12:25 +00:00
Xiaoguang(William) Zhang
7c94deae43 Update alertmanager include snmp_notifier function
Change-Id: I5aedbdcdbba397a9fddde19a0898cb91de08553a
2020-08-07 12:25:33 -04:00
Zuul
3d76931e55 Merge "Fluentd: Remove Deployment Option" 2020-08-04 21:06:26 +00:00
Steven Fitzpatrick
959417f321 Fluentd: Remove Deployment Option
This chart could deploy fluentd either as a Deployment
or a Daemonset. Both options would use the deployment-fluentd
template with various sections toggled off based on values.yaml

I'd like to know - Does anyone run this chart as a Deployment?
We can simplify the chart, and zuul gates, by changing the chart
to deploy a Daemonset specifically.

Change-Id: Ie88ceadbf5113fc60e5bb0ddef09e18fe07a192c
2020-08-04 19:06:37 +00:00
Chinasubbareddy Mallavarapu
4358251073 [CEPH] OSH-INFRA: Update ceph scripts to create loopback devices
This is to update ceph scripts to create loopback devices
in single script and also to update gate scripts.

Change-Id: Id6e3c09dca20d98fcbcc434e65f790c06b6272e8
2020-07-29 10:05:37 -05:00
Chinasubbareddy Mallavarapu
bfe7a99a61 [CEPH] Make ceph-volume as default deployment tool
This is to make ceph-volume as default deployment tool
since support for ceph-disk got deprectated from Nautilus version of
ceph.

Change-Id: I10f42fd0cb43a951f480594d269fd998de5678bf
2020-07-02 15:05:03 +00:00
Chinasubbareddy Mallavarapu
3bde9f5b90 [CEPH] OSH-INFRA: use loopback devices for ceph osds
- This is to make use of loopback devices for ceph osds since
support for directory backed osds going to deprecate.

- Move to bluestore from filestore for ceph-osds.
- Seperate DB and WAL partitions from data so that gates will validate
  the scenario where we will have fast storage disk for DB and WAL.

Change-Id: Ief6de17c53d6cb57ef604895fdc66dc6c604fd89
2020-06-29 14:09:32 +00:00
Steve Wilkerson
a31bb2b049 Add node-problem-detector chart
This adds a chart for the node problem detector. This chart
will help provide additional insight into the status of the
underlying infrastructure of a deployment.

Updated the chart with new yamllint checks.

Change-Id: I21a24b67b121388107b20ab38ac7703c7a33f1c1
Signed-off-by: Steve Wilkerson <sw5822@att.com>
2020-06-22 13:00:55 -05:00
Gage Hugo
d14d826b26 Remove OSH Authors copyright
The current copyright refers to a non-existent group
"openstack helm authors" with often out-of-date references that
are confusing when adding a new file to the repo.

This change removes all references to this copyright by the
non-existent group and any blank lines underneath.

Change-Id: I1882738cf9757c5350a8533876fd37b5920b5235
2020-05-07 02:11:15 +00:00
Steve Wilkerson
ddd5a74319 Prometheus: Add feature-gate support in deployment scripts
This updates the deployment scripts for Prometheus to leverage the
feature gate functionality rather than bash generation of the list
of override files to use for alerting rules

Change-Id: Ie497ae930f7cc4db690a4ddc812a92e4491cde93
Signed-off-by: Steve Wilkerson <sw5822@att.com>
2020-01-07 22:06:19 +00:00
Steve Wilkerson
edd6ffd712 Reduce osh-infra-logging job scope
This updates the osh-infra-logging single node job to omit the
fluentbit deployment step, as having multiple logging daemonsets
deployed to the single node jobs is causing IO issues. Also, it
was noted that the fluentd-deployment step was missing the
overrides to move the fluentd-deployment release from utilizing a
daemonset to a deployment. This resulted in 3 logging daemons
being deployed to a single host

Change-Id: I4a0c5550e6ea6a331aab0082a975f161e65704bf
Signed-off-by: Steve Wilkerson <sw5822@att.com>
2019-12-17 12:43:12 -06:00
Steve Wilkerson
3a6df3b544 Grafana: Remove default dashboards from chart
This removes the default dashboards from the Grafana chart and
instead places them in the values_overrides directory, similar to
what was done for the Prometheus rules. As Grafana dashboards
will likely be heavily dependent upon end-user needs, the old
default dashboard configs should only be used as a reference
instead of opinionated defaults that are difficult to override.
The previous defaults made using specialized labels for dashboard
variables difficult, as they were making dangerous assumptions
about deployed namespaces and host fqdns. By removing the defaults
entirely, end users can define their own dashboards to meet their
specialized needs

Change-Id: I7def8df68371deda0b75a685363c8a73b818dd45
Signed-off-by: Steve Wilkerson <sw5822@att.com>
2019-12-09 13:39:13 +00:00
Zuul
bb7c2787c3 Merge "Elasticsearch/Kibana: Update version to 7.1.0" 2019-12-06 18:16:05 +00:00
Tin Lam
daefed7218 Add feature gate capability to OSH-Infra
This patch set adds the feature gate capability to OpenStack-Helm-Infra
repository without depending on the main OpenStack-Helm repository.

Change-Id: I70b8fac4fd2365f8eedcf50519f125eb34534f2f
Signed-off-by: Tin Lam <tlam@omegaprime.dev>
Signed-off-by: Tin Lam <tin@irrational.io>
2019-12-03 16:55:00 -06:00
Steve Wilkerson
2d3c9575ff Elasticsearch/Kibana: Update version to 7.1.0
This updates the Elasticsearch and Kibana charts to deploy
version 7.1.0. This move required significant changes to both
charts, including: changing elasticsearch masters to a statefulset
to utilize reliable dns names for the discovery process, config
updates to reflect deprecated/updated/removed values, use the
kibana saved objects api for managing index patterns and setting
the default index, and updating the elasticsearch entrypoint
scripts to reflect the use of elastic-keystore for storing s3
credentials instead of defining them in the configuration file

Change-Id: I270d905f266fc15492e47d8376714ba80603e66d
Signed-off-by: Steve Wilkerson <sw5822@att.com>
2019-12-03 07:43:29 -06:00
Steve Wilkerson
608d75ec8d Add zookeeper chart to osh-infra
This proposes adding a zookeeper chart to osh-infra that aligns
with the design patterns laid out by the other charts in osh-infra
and osh.

Change-Id: I25edc58fc951e7f81f7275ade6cf9c97e0afae02
Signed-off-by: Steve Wilkerson <sw5822@att.com>
Co-Authored-By: Steven Fitzpatrick <steven.fitzpatrick@att.com>
2019-11-14 19:51:20 +00:00
Steve Wilkerson
59dac085ce Nagios: Update ceph health check command
This updates the ceph health check command in Nagios to use the
updated plugin that determines the active ceph-mgr instance
endpoint to use before querying for ceph's health. This results in
more robust and reliable reporting of ceph's overall health

Depends-On: https://review.opendev.org/#/c/693900/

Change-Id: I5eeb076e5af3c820dbdcc3cc321cefcb5f85ef8d
Signed-off-by: Steve Wilkerson <sw5822@att.com>
2019-11-13 08:51:26 -06:00
Steve Wilkerson
d547063c37 Disable cephfs provisioner in multinode jobs
This disables the cephfs provisioner in the multinode
periodic jobs. It seems the helm tests for the ceph
provisioner chart that test cephfs fail more often than
not in the multinode jobs while passing reliably in the
single node check and gate jobs. As cephfs is still
gated, disabling the cephfs provisioner in the periodic
jobs allows for further investigation into this issue
without causing potential regressions

Change-Id: I36e68cc2e446afac8769fb9ab753105909341f24
Signed-off-by: Steve Wilkerson <sw5822@att.com>
2019-08-13 14:49:27 +00:00
Steve Wilkerson
bc20c6c8b6 Elasticsearch: Add cron job to verify snapshot repositories
This adds a cron job to manually verify all snapshot repositories
are registered to any active master and data nodes. This is to
address scenarios where master and data nodes do not have the
desired snapshot repositories registered following node outages
or reboots

Change-Id: Ie6f42e95c3ca4dc2ec70f2852a2bde11e59ec097
Signed-off-by: Steve Wilkerson <sw5822@att.com>
2019-08-02 02:02:14 +00:00
Steve Wilkerson
ae3c07b853 Ceph: Update default test pod timeout for provisioners
This mvoes the default timeout for the ceph provisioners helm test
pod to 600 seconds, as 120 seconds is fairly aggressive.  This
also adds the required --timeout flag to the helm test command in
each job for the ceph provisioners, as well as adding the required
helm test configuration to the armada-lma manifest

Change-Id: I5a3b98de9132fe83cf09b1e5b3fcc513bd496650
Signed-off-by: Steve Wilkerson <sw5822@att.com>
2019-07-12 13:43:38 +00:00
Renis Makadia
c7f5c9979c Add helm tests for Ceph Provisioners chart
- Adding helm tests for Ceph provisioner chart
- Helm test should only executed when deploying chart with
client_secrets: true.

Co-Authored-By: Chinasubbareddy Mallavarapu <cr3938@att.com>

Change-Id: I33421249246dfaf6ea4f835e76a74813dfb3b595
2019-06-12 12:32:30 -05:00
Steve Wilkerson
0a8b710083 Elasticsearch: Add job history to Curator, update schedule key
This updates the Elastic Curator cron job to include configuration
for successful and failed job history limits, similar to the other
cron jobs we deploy. This also moves the key for configuring the
cron schedule from under .Values.conf.curator to a new top level
jobs key to maintain consistency

This also fixes an indentation issue with the deployment overrides
for Curator as well as adds the overrides for the Armada job

Change-Id: I9c720df9677215bdd2bf18be77959bd5f671c0ca
Signed-off-by: Steve Wilkerson <sw5822@att.com>
2019-05-30 15:28:30 +00:00
Steve Wilkerson
bdaf866a4e Fluentd: Support Daemonset deployment
This adds required changes to the Fluentd chart to allow for
deploying Fluentd as either a deployment or a daemonset. This
follows the pattern laid out by the ingress chart. This also
updates the single and multinode jobs to deploy fluentd as both
a daemonset and a deployment for validation

Change-Id: I84353a2daa2ce56ff59882a8d33203286ed27e06
Signed-off-by: Steve Wilkerson <sw5822@att.com>
2019-05-28 08:23:44 -05:00
Steve Wilkerson
abb5e0f713 Separate fluentbit and fluentd charts
This begins to split the fluent-logging chart into two separate
charts, one for fluentbit and one for fluentd. This is to help
isolate each chart and its dependencies better, and to treat each
service as its own entity.

This also moves the job for creating Elasticsearch templates to
the Elasticsearch chart, as the elasticsearch chart should have
ownership of creating the templates for its indices.

This also performs some general cleanup of values keys that are
not currently used

Change-Id: I827277d5faa62b8b59c5960330703d23c297ca47
Signed-off-by: Steve Wilkerson <sw5822@att.com>
2019-05-24 06:31:09 -05:00
Renis Makadia
5985b61286 Ceph-Client: Update, Enable and Cleanup helm tests
- Update ceph-client chart to
1) By default, enable ceph-client helm test. Update enabler
key in values.yaml to follow pattern as in other charts
2) Add needed dependancy for ceph-client helm tests
3) Update helm test script to reduce output and update
error msgs
4) Removed unwanted ENV variables SPECS and EXPECTED_POOLMINSIZE
- Update gate scripts to run helm test command

Change-Id: I6a0e4f5107e49dac081ac2037bcc0f9c0864793f
2019-05-18 03:09:45 +00:00
Steve Wilkerson
7c093716ca Enable fluentd monitoring in single and multinode jobs
This updates the scripts for deploying fluentd to include
overrides for enabling prometheus monitoring. Despite not
deploying prometheus in the osh-infra-logging job, we can still
leverage the post run job to gather metrics from the exporters
service. This gives us the means for verifying the functionality
of the exporter

Change-Id: Id98474de89d86419157635007e2f114f0947498e
2019-05-10 01:18:13 +00:00
Steve Wilkerson
031ee3e6af Elasticsearch: Heap configuration and ingest node updates
This updates the Elasticsearch chart to allow for setting the
heap size per node type instead of for all nodes equally. This
also adds the required environment variable to configure whether
a node is an ingest node. This is set to false, as suggested for
elasticsearch versions <= 6.x

This also removes the ES_PLUGINS_INSTALL environment variable as
it is not used for anything in the current charts

Change-Id: I9096774db46dcbcd48b8a5448f0510984bf4108f
2019-05-06 14:55:45 -05:00
Pete Birley
137b60e599 MariaDB: add basic sanity test
This PS adds a basic sanity test to the mariadb chart, using
mysqlslap.

Change-Id: I7450ea8a66364d123022bc773ee90047f9e69b1c
Signed-off-by: Pete Birley <pete@port.direct>
2019-04-06 13:18:41 -04:00
Steve Taylor
65de349d58 Move ceph-mon's checkPGs cron job to ceph-client
- Move the cronjob from ceph-mon to ceph-client
- Adding ceph-rbd-pool job as dependencies for cronjob
- checkPGs manifest set to true so it will always run
in gate.

Co-Authored-By: Chinasubbareddy Mallavarapu <cr3938@att.com>,
                Renis Makadia <renis.makadia@att.com>

Change-Id: I9855d8d22265e78c7e2f5fa7ece69c9ff532ecb2
2019-03-19 20:53:08 +00:00
Zuul
a831841716 Merge "Gate: Permit ceph deployment from outside the cluster" 2019-03-15 15:19:35 +00:00
Steve Wilkerson
588acdbf8c Elastic Curator: Add basic action overrides for deployment jobs
This adds configuration overrides for a very basic Curator action
that should effectively be a no-op. This is to address periodic
failures seen in the osh-infra-aio-logging job that appear when
the run times coincide with Elastic Curator's cron schedule (every
six hours). This ensures curator actions are defined in cases
where this occurs

Change-Id: Ia2255ada2f32f21888bd4ca96df88496720fd0a5
2019-03-15 13:20:55 +00:00
Pete Birley
d6a0e0b85c Gate: Permit ceph deployment from outside the cluster
This PS extends the gate scripts to allow ceph to be deployed from
a workstation external to the k8s cluster.

Change-Id: I09b9a11747bab32c19637d8dd076b8caa3b89445
Signed-off-by: Pete Birley <pete@port.direct>
2019-03-15 13:20:19 +00:00
Chinasubbareddy M
babe91b75e ceph-rgw: Add network policy for ceph-rgw pods
This is to add ingress network policy for ceph-rgw pods

Change-Id: I32a5d3d9a05b920bc69d5b5bb5a2d27cf6f55542
2019-03-06 03:08:34 +00:00
John Haan
b7a96ca8c9 Fix for absent link packages in ceph deployment shell
There is no "make {package}" line in 030-ceph.sh file.
It causes a failure to execute the shell script.

Change-Id: If787abd7711a02313b6a2acae8a888b5609f27df
2019-02-19 02:27:21 +09:00
Steve Wilkerson
6e2ea01ae0 Mariadb: Use correct credentials for exporter in secret
This updates the mariadb chart to use the correct auth values for
the mariadb prometheus exporter. The correct credentials to use
are the credentials in the oslo_db endpoint

Change-Id: I2d325167d7ffdf911a56fe97b879cb13b0d4c195
2019-02-04 06:23:33 -06:00
Zuul
6ef3f58fb8 Merge "Add pre-fixes to the Selenium jobs and remove "|| true"" 2019-01-31 20:39:40 +00:00
Zuul
b30012a616 Merge "[CEPH] Fixes for the OSD defrag cronjob" 2019-01-31 16:05:14 +00:00
Matthew Heler
fc76091261 [CEPH] Fixes for the OSD defrag cronjob
Fix a naming issue with the cronjob's binary, and schedule the cron
job to run every 15 minutes for the gates. Additonally check to
to ensure we are only running on block devices. Also update the
script to work with ceph-volume created devices.

Change-Id: I8aedab0ac41c191ef39a08034fff3278027d7520
2019-01-31 06:13:05 -06:00
Chris Wedgwood
b7b7c5ea44 [alertmanager] default to 1 replica, multinode gate uses 3
Change-Id: Ifb1420f8dcf7237349a79f1f97aea5e547bafeab
2019-01-30 08:43:18 +00:00
Meg Heisler
98fbc9a1e2 Add pre-fixes to the Selenium jobs and remove "|| true"
This adds xxx-job name prefixes to the Selenium jobs for consistency

This will also remove the "|| true" suffix that was added temporarily to
ensure the Kibana selenium job did not error. The fix for the issue
was merged so the quick fix is no longer needed and may prevent an
error when an issue actually occurs.
Change-Id: I16881974cbf618b31813964b17c090dbfe33fe51
2019-01-29 20:24:57 -06:00
Steve Wilkerson
1e40765d88 OSH-Infra: Update multinode and aio-monitoring/logging jobs
This proposes moving the multinode job to a periodic job to
match the approach used in the openstack-helm repo.

This also adds the openstack-exporter to the aio monitoring job as
it was previously missing.

This also proposes moving the aio-logging and aio-monitoring jobs
to voting

Change-Id: Idcd4544e03facdcd2430683b66bd80c79e73a372
2019-01-23 08:49:48 -06:00
Zuul
958127477d Merge "Additional Selenium tests for Kibana dashboard" 2019-01-17 23:46:14 +00:00