openstack-helm-infra

mirror of https://github.com/optim-enterprises-bv/openstack-helm-infra.git synced 2026-01-08 16:31:25 +00:00

Author	SHA1	Message	Date
Stephen Taylor	9f3b9f4f56	[ceph-client] Add pool rename support for Ceph pools A new value "rename" has been added to the Ceph pool spec to allow pools to be renamed in a brownfield deployment. For greenfield the pool will be created and renamed in a single deployment step, and for a brownfield deployment in which the pool has already been renamed previously no changes will be made to pool names. Change-Id: I3fba88d2f94e1c7102af91f18343346a72872fde	2021-05-11 14:56:06 -06:00
Parsons, Cliff (cp769u)	d4f253ef9f	Make Ceph pool init job consistent with helm test The current pool init job only allows the finding of PGs in the "peering" or "activating" (or active) states, but it should also allow the other possible states that can occur while the PG autoscaler is running ("unknown" and "creating" and "recover"). The helm test is already allowing these states, so the pool init job is being changed to also allow them to be consistent. Change-Id: Ib2c19a459c6a30988e3348f8d073413ed687f98b	2021-05-11 15:38:18 +00:00
Parsons, Cliff (cp769u)	7bb5ff5502	Make ceph-client helm test more PG specific This patchset makes the current ceph-client helm test more specific about checking each of the PGs that are transitioning through inactive states during the test. If any single PG spends more than 30 seconds in any of these inactive states (peering, activating, creating, unknown, etc), then the test will fail. Also, if after the three minute PG checking period is expired, we will no longer fail the helm test, as it is very possible that the autoscaler could be still adjusting the PGs for several minutes after a deployment is done. Change-Id: I7f3209b7b3399feb7bec7598e6e88d7680f825c4	2021-04-16 22:25:53 +00:00
Parsons, Cliff (cp769u)	f20eff164f	Allow Ceph RBD pool job to leave failed pods This patchset will add the capability to configure the Ceph RBD pool job to leave failed pods behind for debugging purposes, if it is desired. Default is to not leave them behind, which is the current behavior. Change-Id: Ife63b73f89996d59b75ec617129818068b060d1c	2021-03-29 19:38:55 +00:00
Parsons, Cliff (cp769u)	167b9eb1a8	Fix ceph-client helm test This patch resolves a helm test problem where the test was failing if it found a PG state of "activating". It could also potentially find a number of other states, like premerge or unknown, that could also fail the test. Note that if these transient PG states are found for more than 3 minutes, the helm test fails. Change-Id: I071bcfedf7e4079e085c2f72d2fbab3adc0b027c	2021-03-22 22:06:27 +00:00
Stephen Taylor	69a7916b92	[ceph-client] Disable autoscaling before pools are created When autoscaling is disabled after pools are created, there is an opportunity for some autoscaling to take place before autoscaling is disabled. This change checks to see if autoscaling needs to be disabled before creating pools, then checks to see if it needs to be enabled after creating pools. This ensures that autoscaling won't happen when autoscaler is disabled and autoscaling won't start prematurely as pools are being created when it is enabled. Change-Id: I8803b799b51735ecd3a4878d62be45ec50bbbe19	2021-03-12 15:03:51 +00:00
bw6938	bb3ce70a10	[ceph-client] enhance logic to enable and disable the autoscaler The autoscaler was introduced in the Nautilus release. This change only sets the pg_num value for a pool if the autoscaler is disabled or the Ceph release is earlier than Nautilus. When pools are created with the autoscaler enabled, a pg_num_min value specifies the minimum value of pg_num that the autoscaler will target. That default was recently changed from 8 to 32 which severely limits the number of pools in a small cluster per https://github.com/rook/rook/issues/5091. This change overrides the default pg_num_min value of 32 with a value of 8 (matching the default pg_num value of 8) using the optional --pg-num-min <value> argument at pool creation and pg_num_min value for existing pools. Change-Id: Ie08fb367ec8b1803fcc6e8cd22dc8da43c90e5c4	2021-03-09 22:11:47 +00:00
Stephen Taylor	cf7d665e79	[ceph-client] Separate pool quotas from pg_num calculations Currently pool quotas and pg_num calculations are both based on percent_total_data values. This can be problematic when the amount of data allowed in a pool doesn't necessarily match the percentage of the cluster's data expected to be stored in the pool. It is also more intuitive to define absolute quotas for pools. This change adds an optional pool_quota value that defines an explicit value in bytes to be used as a pool quota. If pool_quota is omitted for a given pool, that pool's quota is set to 0 (no quota). A check_pool_quota_target() Helm test has also been added to verify that the sum of all pool quotas does not exceed the target quota defined for the cluster if present. Change-Id: I959fb9e95d8f1e03c36e44aba57c552a315867d0	2021-02-26 16:49:10 +00:00
Brian Wickersham	714cfdad84	Revert "[ceph-client] enhance logic to enable the autoscaler for Octopus" This reverts commit `910ed906d0`. Reason for revert: May be causing upstream multinode gates to fail. Change-Id: I1ea7349f5821b549d7c9ea88ef0089821eff3ddf	2021-02-25 17:04:37 +00:00
bw6938	910ed906d0	[ceph-client] enhance logic to enable the autoscaler for Octopus Change-Id: I90d4d279a96cd298eba03e9c0b05a8f2a752e746	2021-02-19 21:03:45 +00:00
Stephen Taylor	1dcaffdf70	[ceph-client] Don't wait for premerge PGs in the rbd pool job The wait_for_pgs() function in the rbd pool job waits for all PGs to become active before proceeding, but in the event of an upgrade that decreases pg_num values on one or more pools it sees PGs in the clean+premerge+peered state as peering and waits for "peering" to complete. Since these PGs are in the process of merging into active PGs, waiting for the merge to complete is unnecessary. This change will reduce the wait time in this job significantly in these cases. Change-Id: I9a2985855a25cdb98ef6fe011ba473587ea7a4c9	2021-02-05 09:55:21 -07:00
Chinasubbareddy Mallavarapu	da289c78cb	[CEPH] Uplift from Nautilus to Octopus release This is to uplift ceph charts from 14.X release to 15.X Change-Id: I4f7913967185dd52d4301c218450cfad9d0e2b2b	2021-02-03 22:34:53 +00:00
Stephen Taylor	6cf614d7a8	[ceph-client] Fix Helm test check_pgs() check for inactive PGs The 'ceph pg dump_stuck' command that looks for PGs that are stuck inactive doesn't include the 'inactive' keyword, so it also finds PGs that are active that it believes are stuck. This change adds the 'inactive' keyword to the command so only inactive PGs are considered. Change-Id: Id276deb3e5cb8c7e30f5a55140b8dbba52a33900	2021-01-25 17:54:26 +00:00
Parsons, Cliff (cp769u)	970c23acf4	Improvements for ceph-client helm tests This commit introduces the following helm test improvement for the ceph-client chart: 1) Reworks the pg_validation function so that it allows some time for peering PGs to finish peering, but fail if any other critical errors are seen. The actual pg validation was split out into a function called check_pgs(), and the pg_validation function manages the looping aspects. 2) The check_cluster_status function now calls pv_validation if the cluster status is not OK. This is very similar to what was happening before, except now, the logic will not be repeated. Change-Id: I65906380817441bd2ff9ff9cfbf9586b6fdd2ba7	2021-01-18 16:12:33 +00:00
Frank Ritchie	abf8d1bc6e	Run as ceph user and disallow privilege escalation This PS is to address security best practices concerning running containers as a non-privileged user and disallowing privilege escalation. Ceph-client is used for the mgr and mds pods. Change-Id: Idbd87408c17907eaae9c6398fbc942f203b51515	2021-01-04 12:58:09 -05:00
Chinasubbareddy Mallavarapu	c3f921c916	[ceph-client] fix the logic to disable the autoscaler on pools This is to fix the logic to disable the autosclaer on pools as its not considering newly created pools. Change-Id: I76fe106918d865b6443453b13e3a4bd6fc35206a	2020-10-16 21:17:07 +00:00
Andrii Ostapenko	1532958c80	Change helm-toolkit dependency version to ">= 0.1.0" Since we introduced chart version check in gates, requirements are not satisfied with strict check of 0.1.0 Change-Id: I15950b735b4f8566bc0018fe4f4ea9ba729235fc Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>	2020-09-24 12:19:28 -05:00
Brian Wickersham	11ab577099	[ceph-client] Fix issue with checking if autoscaler should be enabled This corrects an issue in the create_pool function with checking if the pg autoscaler should be enabled. Change-Id: Id9be162fd59cc452477f5cc5c5698de7ae5bb141	2020-09-18 13:19:55 +00:00
Zuul	2bfce96304	Merge "Run chart-testing on all charts"	2020-09-17 14:38:19 +00:00
Mohammed Naser	c7a45f166f	Run chart-testing on all charts Added chart lint in zuul CI to enhance the stability for charts. Fixed some lint errors in the current charts. Change-Id: I9df4024c7ccf8b3510e665fc07ba0f38871fcbdb	2020-09-11 18:02:38 +03:00
Kabanov, Dmitrii	78137fd4ce	[ceph-client] Update queries in wait_for_pgs function The PS updates queries in wait_for_pgs function (init pool script). The queries were updated to handle the cases when PGs have "activating" and "peered" statuses. Change-Id: Ie93797fcb72462f61bca3a007f6649ab46ef4f97	2020-09-10 21:54:36 +00:00
Chinasubbareddy Mallavarapu	8adc6216bc	[CEPH] Disable ceph pg autoscaler on pools by reading from values This is to disable unintentionally enabled pg autoscaler on pools by reading it from values. Change-Id: Ib919ae7786ec1d4cbe7a309d28fd6571aa6195de	2020-08-21 16:55:33 -05:00
Chinasubbareddy Mallavarapu	4214e85a77	[CEPH] Add missing ceph cluster name for helm tests This is to export the ceph cluster name as environment variable since its getting referred by scripts. also to fix the query to get inactive pgs. Change-Id: I1db5cfbd594c0cc6d54f748f22af5856d9594922	2020-08-14 16:09:19 -05:00
Kabanov, Dmitrii	4557f6fbe8	[ceph] Update queries to filter pgs correctly The PS updates queries in wait_for_pgs function in ceph-client and ceph-osd charts. It allows more accurately check the status of PGs. The output of the "ceph pg ls" command may contain many PG statuses, like "active+clean", "active+undersized+degraded", "active+recovering", "peering" and etc. But along with these statuses there may be such as "stale+active+clean". To avoid the wrong interpretation of the status of the PSs the filter was changed from "startswith(active+)" to "contains(active)". Also PS adds a delay after restart of the pods to post-apply job. It allows to reduce the number of useless queries to kubernetes. Change-Id: I0eff2ce036ad543bf2554bd586c2a2d3e91c052b	2020-08-13 22:45:01 -07:00
Zuul	c19ee4ab94	Merge "[ceph-client] Fix crush weight comparison in reweight_osds()"	2020-08-13 20:40:46 +00:00
Taylor, Stephen (st053q)	f66f9fe560	[ceph-client] Fix crush weight comparison in reweight_osds() The recently-added crush weight comparison in reweight_osds() that checks weights for zero isn't working correctly because the expected weight is being calculated to two decimal places and then compared against "0" as a string. This updates the comparison string to "0.00" to match the calculation. Change-Id: I29387a597a21180bb7fba974b4daeadf6ffc182d	2020-08-13 12:00:32 -06:00
Chinasubbareddy Mallavarapu	64b423cee0	[ceph] Check for osds deployed with zero crush weight This is to check for osds deployed with zero crush weight from helm tests. Change-Id: Ie8d9c65b33bf7a026a342d1d7e81ec37cb981db3	2020-08-13 14:39:38 +00:00
Taylor, Stephen (st053q)	f1e9a6ba83	[ceph-client] Refrain from reweighting OSDs to 0 If circumstances are such that the reweight function believes OSD disks have zero size, refrain from reweighting OSDs to 0. This can happen if OSDs are deployed with the noup flag set. Also move the setting and unsetting of flags above this calculation as an additional precautionary measure. Change-Id: Ibc23494e0e75cfdd7654f5c0d3b6048b146280f7	2020-08-11 09:48:53 -06:00
Zuul	9ed951aa32	Merge "[Ceph-client] Add check of target osd value"	2020-08-03 21:31:09 +00:00
Zuul	c0b86523a7	Merge "[ceph-client] update logic of inactive pgs check"	2020-08-03 20:12:06 +00:00
Frank Ritchie	5909bcbdef	Use hostPID for ceph-mgr deployment This change is to address a memory leak in the ceph-mgr deployment. The leak has also been noted in: https://review.opendev.org/#/c/711085 Without this change memory usage for the active ceph-mgr pod will steadily increase by roughly 100MiB per hour until all available memory has been exhausted. Reset messages will also be seen in the active and standby ceph-mgr pod logs. Sample messages: --- 0 client.0 ms_handle_reset on v2:10.0.0.226:6808/1 0 client.0 ms_handle_reset on v2:10.0.0.226:6808/1 0 client.0 ms_handle_reset on v2:10.0.0.226:6808/1 --- The root cause of the resets and associated memory leak appears to be due to multiple ceph pods sharing the same IP address (due to hostNetwork being true) and PID (due to hostPID being false). In the messages above the "1" at the end of the line is the PID. Ceph appears to use the Version:IP:Port/PID (v2:10.0.0.226:6808/1) tuple as a unique identifier. When hostPID is false conflicts arise. Setting hostPID to true stops the reset messages and memory leak. Change-Id: I9821637e75e8f89b59cf39842a6eb7e66518fa2c	2020-08-03 17:35:51 +00:00
Kabanov, Dmitrii	f6d6ae051d	[ceph-client] update logic of inactive pgs check The PS updates wait_for_inactive_pgs function: - Changed the name of the function to wait_for_pgs - Added a query for getting status of pgs - All pgs should be in "active+" state at least three times in a row Change-Id: Iecc79ebbdfaa74886bca989b23f7741a1c3dca16	2020-08-03 08:42:58 -07:00
Kabanov, Dmitrii	47ce52a5cf	[Ceph-client] Add check of target osd value The PS adds the check of target osd value. The expected amount of OSDs should be always more or equal to existing OSDs. If there is more OSDs than expected it means that the value is not correct. Change-Id: I117a189a18dbb740585b343db9ac9b596a34b929	2020-08-03 15:38:14 +00:00
Stephen Taylor	84f1557566	[ceph-client] Fix a helm test issue and disable PG autoscaler Currently the Ceph helm tests pass when the deployed Ceph cluster is unhealthy. This change expands the cluster status testing logic to pass when all PGs are active and fail if any PG is inactive. The PG autoscaler is currently causing the deployment to deploy unhealthy Ceph clusters. This change also disables it. It should be re-enabled once those issues are resolved. Change-Id: Iea1ff5006fc00e4570cf67c6af5ef6746a538058	2020-07-31 14:46:10 +00:00
Kabanov, Dmitrii	b736a74e39	[ceph] Add noup flag check to helm tests The PS adds noup flag check to Ceph-client and Ceph-OSD helm tests. It allows successfully pass the tests even if noup flag is set. Change-Id: Ida43d83902d26bef3434c47e71959bb2086ad82a	2020-07-22 15:30:51 +00:00
Kabanov, Dmitrii	ffb4f86796	[ceph-client] Add OSD check before pool creation The PS adds the check of count of OSDs. It ensures that expected amount of OSDs is present at the moment of creation of a pool. The expected amount of OSDs is calculated based on target amount of OSDs and required percent of OSDs. Change-Id: Iadf36dbeca61c47d9a9db60cf5335e4e1cb7b74b	2020-07-21 17:54:16 +00:00
Stephen Taylor	aaf52acc27	[ceph-client] Add back a new version of reweight_osds() https://review.opendev.org/733193 removed the reweight_osds() function from the ceph-client and weighted OSDs as they are added in the ceph-osd chart instead. Since then some situations have come up where OSDs were already deployed with incorrect weights and this function is needed in order to weight them properly later on. This new version calculates an expected weight for each OSD, compares it to the OSD's actual weight, and makes an adjustment if necessary. Change-Id: I58bc16fc03b9234a08847d29aa14067bec05f1f1	2020-07-20 19:42:52 +00:00
Kabanov, Dmitrii	eecf56b8a9	[Ceph-client, ceph-osd] Update helm test The PS updates helm test and replaces "expected_osds" variable by the amount of OSDs available in the cluster (ceph-client). Also the PS updates the logic of calculation of minimum amount of OSDs. Change-Id: Ic8402d668d672f454f062bed369cac516ed1573e	2020-07-09 15:53:49 +00:00
Andrii Ostapenko	824f168efc	Undo octal-values restriction together with corresponding code Unrestrict octal values rule since benefits of file modes readability exceed possible issues with yaml 1.2 adoption in future k8s versions. These issues will be addressed when/if they occur. Also ensure osh-infra is a required project for lint job, that matters when running job against another project. Change-Id: Ic5e327cf40c4b09c90738baff56419a6cef132da Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>	2020-07-07 15:42:53 +00:00
Zuul	0a35fd827e	Merge "Enable key-duplicates and octal-values yamllint checks"	2020-06-18 04:49:03 +00:00
Zuul	6217a5eda3	Merge "[ceph-osd, ceph-client] Weight OSDs as they are added"	2020-06-18 02:22:53 +00:00
Stephen Taylor	59b825ae48	[ceph-osd, ceph-client] Weight OSDs as they are added Currently OSDs are added by the ceph-osd chart with zero weight and they get reweighted to proper weights in the ceph-client chart after all OSDs have been deployed. This causes a problem when a deployment is partially completed and additional OSDs are added later. In this case the ceph-client chart has already run and the new OSDs don't ever get weighted correctly. This change weights OSDs properly as they are deployed instead. As noted in the script, the noin flag may be set during the deployment to prevent rebalancing as OSDs are added if necessary. Added the ability to set and unset Ceph cluster flags in the ceph-client chart. Change-Id: Ic9a3d8d5625af49b093976a855dd66e5705d2c29	2020-06-17 21:49:39 +00:00
Andrii Ostapenko	83e27e600c	Enable key-duplicates and octal-values yamllint checks With corresponding code changes. Change-Id: I11cde8971b3effbb6eb2b69a7d31ecf12140434e	2020-06-17 13:14:30 -05:00
Andrii Ostapenko	dfb32ccf60	Enable yamllint rules for templates - braces - brackets - colons - commas - comments - comments-indentation - document-start - hyphens - indentation With corresponding code changes. Also idempotency fix for lint script. Change-Id: Ibe5281cbb4ad7970e92f3d1f921abb1efc89dc3b	2020-06-17 13:13:53 -05:00
Andrii Ostapenko	8f24a74bc7	Introduces templates linting This commit rewrites lint job to make template linting available. Currently yamllint is run in warning mode against all templates rendered with default values. Duplicates detected and issues will be addressed in subsequent commits. Also all y*ml files are added for linting and corresponding code changes are made. For non-templates warning rules are disabled to improve readability. Chart and requirements yamls are also modified in the name of consistency. Change-Id: Ife6727c5721a00c65902340d95b7edb0a9c77365	2020-06-11 23:29:42 -05:00
Zuul	08ca4eb8d9	Merge "ceph: Add metadata labels to CronJob"	2020-06-02 19:37:39 +00:00
Andrii Ostapenko	731a6b4cfa	Enable yamllint checks - document-end - document-start - empty-lines - hyphens - indentation - key-duplicates - new-line-at-end-of-file - new-lines - octal-values with corresponding code adjustment. Change-Id: I92d6aa20df82aa0fe198f8ccd535cfcaf613f43a	2020-05-29 19:49:05 +00:00
Kabanov, Dmitrii	46930fcd06	[Ceph] Upgrade Ceph from 14.2.8 to 14.2.9 version The PS upgrades Ceph to 14.2.9 version. Change-Id: I72a2e39a7b4294ac8fd42b1dbc78579c2c0ae791	2020-05-28 15:46:47 +00:00
Tin Lam	d95259936f	Revert "[ceph-osd, ceph-client] Weight OSDs as they are added" This patch currently breaks cinder helm test in the OSH cinder jobs blocking the gate. Proposing to revert to unblock the jobs. This reverts commit `f59cb11932`. Change-Id: I73012ec6f4c3d751131f1c26eea9266f7abc1809	2020-05-25 21:09:15 +00:00
Steve Taylor	f59cb11932	[ceph-osd, ceph-client] Weight OSDs as they are added Currently OSDs are added by the ceph-osd chart with zero weight and they get reweighted to proper weights in the ceph-client chart after all OSDs have been deployed. This causes a problem when a deployment is partially completed and additional OSDs are added later. In this case the ceph-client chart has already run and the new OSDs don't ever get weighted correctly. This change weights OSDs properly as they are deployed instead. As noted in the script, the noin flag may be set during the deployment to prevent rebalancing as OSDs are added if necessary. Added the ability to set and unset Ceph cluster flags in the ceph-client chart. Change-Id: Iac50352c857d874f3956776c733d09e0034a0285	2020-05-22 09:21:44 -06:00

1 2 3 4

155 Commits