This change updates many of the deployment scripts to properly
handle deploying each service via helm 3 and updates each job
to use the helm v3 install script.
Change-Id: I90a7b59231376b9179439c2554e46449d59b9c15
With the move to helm v3, helm status requires a namespace to be specified, but doing so breaks helm v2 compatability. This change removes the usage of helm serve in openstack-helm-infra's deployment scripts.
Change-Id: Ia600b979bf48629962577b3c5674bfa7415d78c0
The current implementation of the Ceph CSI provisioner is tied too
closely with the older Ceph RBD provisioner, which doesn't let the
deployer deploy Ceph CSI provisioner without the old RBD provisioner.
This patchset will decouple them such that they can be deployed
independently from one another.
A few other changes are needed as well:
1) The deployment/gate scripts are updated so that the old RBD and
CSI RBD provisioners are separately enabled/disabled as needed.
The original RBD provisioner is now deprecated.
2) Ceph-mon chart is updated because it had some RBD storageclass
data in values.yaml that is not needed for ceph-mon deployment.
3) Fixed a couple of bugs in job-cephfs-client-key.yaml where RBD
parameters were being used instead of cephfs parameters.
Change-Id: Icb5f78dcefa51990baf1b6d92411eb641c2ea9e2
About zookeeper chart,It's been removed,But there are still some related scripts that have not been completely deleted,we should remove them.
Change-Id: Iae20717482ad6c7a40f54174eef120d094abbd59
If the test pod still exists, then the new test run
fails with ERROR: pods "abc-test" already exists
So, Removing remaining test pods before new test run
Change-Id: I3b3ed5ceaf420aa39a669b4a50a838ad154b1fdd
Closes-Bug: #1882030
This chart could deploy fluentd either as a Deployment
or a Daemonset. Both options would use the deployment-fluentd
template with various sections toggled off based on values.yaml
I'd like to know - Does anyone run this chart as a Deployment?
We can simplify the chart, and zuul gates, by changing the chart
to deploy a Daemonset specifically.
Change-Id: Ie88ceadbf5113fc60e5bb0ddef09e18fe07a192c
This is to update ceph scripts to create loopback devices
in single script and also to update gate scripts.
Change-Id: Id6e3c09dca20d98fcbcc434e65f790c06b6272e8
This is to make ceph-volume as default deployment tool
since support for ceph-disk got deprectated from Nautilus version of
ceph.
Change-Id: I10f42fd0cb43a951f480594d269fd998de5678bf
- This is to make use of loopback devices for ceph osds since
support for directory backed osds going to deprecate.
- Move to bluestore from filestore for ceph-osds.
- Seperate DB and WAL partitions from data so that gates will validate
the scenario where we will have fast storage disk for DB and WAL.
Change-Id: Ief6de17c53d6cb57ef604895fdc66dc6c604fd89
This adds a chart for the node problem detector. This chart
will help provide additional insight into the status of the
underlying infrastructure of a deployment.
Updated the chart with new yamllint checks.
Change-Id: I21a24b67b121388107b20ab38ac7703c7a33f1c1
Signed-off-by: Steve Wilkerson <sw5822@att.com>
The current copyright refers to a non-existent group
"openstack helm authors" with often out-of-date references that
are confusing when adding a new file to the repo.
This change removes all references to this copyright by the
non-existent group and any blank lines underneath.
Change-Id: I1882738cf9757c5350a8533876fd37b5920b5235
This updates the deployment scripts for Prometheus to leverage the
feature gate functionality rather than bash generation of the list
of override files to use for alerting rules
Change-Id: Ie497ae930f7cc4db690a4ddc812a92e4491cde93
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This updates the osh-infra-logging single node job to omit the
fluentbit deployment step, as having multiple logging daemonsets
deployed to the single node jobs is causing IO issues. Also, it
was noted that the fluentd-deployment step was missing the
overrides to move the fluentd-deployment release from utilizing a
daemonset to a deployment. This resulted in 3 logging daemons
being deployed to a single host
Change-Id: I4a0c5550e6ea6a331aab0082a975f161e65704bf
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This removes the default dashboards from the Grafana chart and
instead places them in the values_overrides directory, similar to
what was done for the Prometheus rules. As Grafana dashboards
will likely be heavily dependent upon end-user needs, the old
default dashboard configs should only be used as a reference
instead of opinionated defaults that are difficult to override.
The previous defaults made using specialized labels for dashboard
variables difficult, as they were making dangerous assumptions
about deployed namespaces and host fqdns. By removing the defaults
entirely, end users can define their own dashboards to meet their
specialized needs
Change-Id: I7def8df68371deda0b75a685363c8a73b818dd45
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This patch set adds the feature gate capability to OpenStack-Helm-Infra
repository without depending on the main OpenStack-Helm repository.
Change-Id: I70b8fac4fd2365f8eedcf50519f125eb34534f2f
Signed-off-by: Tin Lam <tlam@omegaprime.dev>
Signed-off-by: Tin Lam <tin@irrational.io>
This updates the Elasticsearch and Kibana charts to deploy
version 7.1.0. This move required significant changes to both
charts, including: changing elasticsearch masters to a statefulset
to utilize reliable dns names for the discovery process, config
updates to reflect deprecated/updated/removed values, use the
kibana saved objects api for managing index patterns and setting
the default index, and updating the elasticsearch entrypoint
scripts to reflect the use of elastic-keystore for storing s3
credentials instead of defining them in the configuration file
Change-Id: I270d905f266fc15492e47d8376714ba80603e66d
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This proposes adding a zookeeper chart to osh-infra that aligns
with the design patterns laid out by the other charts in osh-infra
and osh.
Change-Id: I25edc58fc951e7f81f7275ade6cf9c97e0afae02
Signed-off-by: Steve Wilkerson <sw5822@att.com>
Co-Authored-By: Steven Fitzpatrick <steven.fitzpatrick@att.com>
This updates the ceph health check command in Nagios to use the
updated plugin that determines the active ceph-mgr instance
endpoint to use before querying for ceph's health. This results in
more robust and reliable reporting of ceph's overall health
Depends-On: https://review.opendev.org/#/c/693900/
Change-Id: I5eeb076e5af3c820dbdcc3cc321cefcb5f85ef8d
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This disables the cephfs provisioner in the multinode
periodic jobs. It seems the helm tests for the ceph
provisioner chart that test cephfs fail more often than
not in the multinode jobs while passing reliably in the
single node check and gate jobs. As cephfs is still
gated, disabling the cephfs provisioner in the periodic
jobs allows for further investigation into this issue
without causing potential regressions
Change-Id: I36e68cc2e446afac8769fb9ab753105909341f24
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This adds a cron job to manually verify all snapshot repositories
are registered to any active master and data nodes. This is to
address scenarios where master and data nodes do not have the
desired snapshot repositories registered following node outages
or reboots
Change-Id: Ie6f42e95c3ca4dc2ec70f2852a2bde11e59ec097
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This mvoes the default timeout for the ceph provisioners helm test
pod to 600 seconds, as 120 seconds is fairly aggressive. This
also adds the required --timeout flag to the helm test command in
each job for the ceph provisioners, as well as adding the required
helm test configuration to the armada-lma manifest
Change-Id: I5a3b98de9132fe83cf09b1e5b3fcc513bd496650
Signed-off-by: Steve Wilkerson <sw5822@att.com>
- Adding helm tests for Ceph provisioner chart
- Helm test should only executed when deploying chart with
client_secrets: true.
Co-Authored-By: Chinasubbareddy Mallavarapu <cr3938@att.com>
Change-Id: I33421249246dfaf6ea4f835e76a74813dfb3b595
This updates the Elastic Curator cron job to include configuration
for successful and failed job history limits, similar to the other
cron jobs we deploy. This also moves the key for configuring the
cron schedule from under .Values.conf.curator to a new top level
jobs key to maintain consistency
This also fixes an indentation issue with the deployment overrides
for Curator as well as adds the overrides for the Armada job
Change-Id: I9c720df9677215bdd2bf18be77959bd5f671c0ca
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This adds required changes to the Fluentd chart to allow for
deploying Fluentd as either a deployment or a daemonset. This
follows the pattern laid out by the ingress chart. This also
updates the single and multinode jobs to deploy fluentd as both
a daemonset and a deployment for validation
Change-Id: I84353a2daa2ce56ff59882a8d33203286ed27e06
Signed-off-by: Steve Wilkerson <sw5822@att.com>
This begins to split the fluent-logging chart into two separate
charts, one for fluentbit and one for fluentd. This is to help
isolate each chart and its dependencies better, and to treat each
service as its own entity.
This also moves the job for creating Elasticsearch templates to
the Elasticsearch chart, as the elasticsearch chart should have
ownership of creating the templates for its indices.
This also performs some general cleanup of values keys that are
not currently used
Change-Id: I827277d5faa62b8b59c5960330703d23c297ca47
Signed-off-by: Steve Wilkerson <sw5822@att.com>
- Update ceph-client chart to
1) By default, enable ceph-client helm test. Update enabler
key in values.yaml to follow pattern as in other charts
2) Add needed dependancy for ceph-client helm tests
3) Update helm test script to reduce output and update
error msgs
4) Removed unwanted ENV variables SPECS and EXPECTED_POOLMINSIZE
- Update gate scripts to run helm test command
Change-Id: I6a0e4f5107e49dac081ac2037bcc0f9c0864793f
This updates the scripts for deploying fluentd to include
overrides for enabling prometheus monitoring. Despite not
deploying prometheus in the osh-infra-logging job, we can still
leverage the post run job to gather metrics from the exporters
service. This gives us the means for verifying the functionality
of the exporter
Change-Id: Id98474de89d86419157635007e2f114f0947498e
This updates the Elasticsearch chart to allow for setting the
heap size per node type instead of for all nodes equally. This
also adds the required environment variable to configure whether
a node is an ingest node. This is set to false, as suggested for
elasticsearch versions <= 6.x
This also removes the ES_PLUGINS_INSTALL environment variable as
it is not used for anything in the current charts
Change-Id: I9096774db46dcbcd48b8a5448f0510984bf4108f
This PS adds a basic sanity test to the mariadb chart, using
mysqlslap.
Change-Id: I7450ea8a66364d123022bc773ee90047f9e69b1c
Signed-off-by: Pete Birley <pete@port.direct>
- Move the cronjob from ceph-mon to ceph-client
- Adding ceph-rbd-pool job as dependencies for cronjob
- checkPGs manifest set to true so it will always run
in gate.
Co-Authored-By: Chinasubbareddy Mallavarapu <cr3938@att.com>,
Renis Makadia <renis.makadia@att.com>
Change-Id: I9855d8d22265e78c7e2f5fa7ece69c9ff532ecb2
This adds configuration overrides for a very basic Curator action
that should effectively be a no-op. This is to address periodic
failures seen in the osh-infra-aio-logging job that appear when
the run times coincide with Elastic Curator's cron schedule (every
six hours). This ensures curator actions are defined in cases
where this occurs
Change-Id: Ia2255ada2f32f21888bd4ca96df88496720fd0a5
This PS extends the gate scripts to allow ceph to be deployed from
a workstation external to the k8s cluster.
Change-Id: I09b9a11747bab32c19637d8dd076b8caa3b89445
Signed-off-by: Pete Birley <pete@port.direct>
There is no "make {package}" line in 030-ceph.sh file.
It causes a failure to execute the shell script.
Change-Id: If787abd7711a02313b6a2acae8a888b5609f27df
This updates the mariadb chart to use the correct auth values for
the mariadb prometheus exporter. The correct credentials to use
are the credentials in the oslo_db endpoint
Change-Id: I2d325167d7ffdf911a56fe97b879cb13b0d4c195
Fix a naming issue with the cronjob's binary, and schedule the cron
job to run every 15 minutes for the gates. Additonally check to
to ensure we are only running on block devices. Also update the
script to work with ceph-volume created devices.
Change-Id: I8aedab0ac41c191ef39a08034fff3278027d7520
This adds xxx-job name prefixes to the Selenium jobs for consistency
This will also remove the "|| true" suffix that was added temporarily to
ensure the Kibana selenium job did not error. The fix for the issue
was merged so the quick fix is no longer needed and may prevent an
error when an issue actually occurs.
Change-Id: I16881974cbf618b31813964b17c090dbfe33fe51
This proposes moving the multinode job to a periodic job to
match the approach used in the openstack-helm repo.
This also adds the openstack-exporter to the aio monitoring job as
it was previously missing.
This also proposes moving the aio-logging and aio-monitoring jobs
to voting
Change-Id: Idcd4544e03facdcd2430683b66bd80c79e73a372