Under some circumstances, armada job attempts to recreate an existing
Service Account for ceph-mgr. This patchset aims to remediate the issue.
Change-Id: I69bb9045c0e2f24dc2fa9e94ab6a09a58221e1f5
This change corrects the ceph-templates configmap name to be
release-specific like the other configmaps in the chart. This
allows for more robustness in downstream implementations.
Change-Id: I1d09d14f9ba94dbbe11d8a80776f57b9cdf41210
The recent name changes to the ceph-mon configmaps did not get
propagated to all resources in the chart. The hard-coded names in
the unchanged cases were correct and resources deployed
successfully, but this change corrects those configmap names across
all resources for the sake of robustness.
Change-Id: I3195e5ba2726892a7b6e0c31c0fac43bae4aa399
This change makes the ceph-mon configmap names dynamic based on
release name to match how the ceph-osd chart is naming configmaps.
The new ceph-mon post-apply job needs this in some cases in order
not to have conflicting configmap names in separate releases.
Change-Id: Id26d0a8310ccff80a608e25d2b0a74a41f9e6a55
This is a code improvement to reuse ceph monitor doscovering function
in different templates. Calling the mentioned above function from
a single place (helm-infra snippets) allows less code maintenance
and simlifies further development.
Rev. 0.1 Charts version bump for ceph-client, ceph-mon, ceph-osd,
ceph-provisioners and helm-toolkit
Rev. 0.2 Mon endpoint discovery functionality added for
the rados gateway. ClusterRole and ClusterRoleBinding added.
Rev. 0.3 checkdns is allowed to correct ceph.conf for RGW deployment.
Rev. 0.4 Added RoleBinding to the deployment-rgw.
Rev. 0.5 Remove _namespace-client-ceph-config-manager.sh.tpl and
the appropriate job, because of duplicated functionality.
Related configuration has been removed.
Rev. 0.6 RoleBinding logic has been changed to meet rules:
checkdns namespace - HAS ACCESS -> RGW namespace(s)
Change-Id: Ie0af212bdcbbc3aa53335689deed9b226e5d4d89
If the OnDelete pod restart strategy is used for the ceph-mon
daemonset, run a post-apply job to restart the ceph-mon pods one
at a time. Otherwise the mons could restart before the mgrs, which
can be problematic in some upgrade scenarios.
Change-Id: I57f87130e95088217c3cfe73512caaae41d3ef22
This change moves the ceph-mgr deployment from the ceph-client
chart to the ceph-mon chart. Its purpose is to facilitate the
proper Ceph upgrade procedure, which prescribes restarting mgr
daemons before mon daemons.
There will be additional work required to implement the correct
daemon restart procedure for upgrades. This change only addresses
the move of the ceph-mgr deployment.
Change-Id: I3ac4a75f776760425c88a0ba1edae5fb339f128d
This change adds a condition to ensure that an IP address was
obtained for a ceph-mon kubernetes endpoint before building the
expected endpoint string and checking it against the monmap. If an
IP address isn't available, the check is skipped for that mon.
Change-Id: I45a2e2987b5ef0c27b0bb765f7967fcce1af62e4
The ceph-mon-check pod only knew about the v1 port before, and didn't
have the proper mon_host configuration in its ceph.conf file. This
patchset adds knowledge about the v2 port also and correctly configures
the ceph.conf file. Also fixes a namespace hardcoding that was found
in the last ceph-mon-check fix.
Change-Id: I460e43864a2d4b0683b67ae13bf6429d846173fc
A race condition exists that can cause the mon-check pod to delete
mons from the monmap that are only down temporarily. This sometimes
causes issues with the monmap when those mons come back up. This
change adds a check to see if the list of mons in the monmap is
larger than expected before removing anything. If not, the monmap
is left alone.
Change-Id: I43b186bf80741fc178c6806d24c179417d7f2406
This change updates the helm-toolkit path in each chart as part
of the move to helm v3. This is due to a lack of helm serve.
Change-Id: I011e282616bf0b5a5c72c1db185c70d8c721695e
If labels are not specified on a Job, kubernetes defaults them
to include the labels of their underlying Pod template. Helm 3
injects metadata into all resources [0] including a
`app.kubernetes.io/managed-by: Helm` label. Thus when kubernetes
sees a Job's labels they are no longer empty and thus do not get
defaulted to the underlying Pod template's labels. This is a
problem since Job labels are depended on by
- Armada pre-upgrade delete hooks
- Armada wait logic configurations
- kubernetes-entrypoint dependencies
Thus for each Job template this adds labels matching the
underlying Pod template to retain the same labels that were
present with Helm 2.
[0]: https://github.com/helm/helm/pull/7649
Change-Id: I3b6b25fcc6a1af4d56f3e2b335615074e2f04b6d
The checkDNS script which is run inside the ceph-mon pods has had
a bug for a while now. If a value of "up" is passed in, it adds
brackets around it, but then doesn't check for the brackets when
checking for a value of "up". This causes a value of "{up}" to be
written into the ceph.conf for the mon_host line and that causes
the mon_host to not be able to respond to ceph/rbd commands. Its
normally not a problem if DNS is working, but if DNS stops working
this can happen.
This patch changes the comparison to look for "{up}" instead of
"up" in three different files, which should fix the problem.
Change-Id: I89cf07b28ad8e0e529646977a0a36dd2df48966d
This PS updates the mon-check reap-zombies python script to consider
the more recent Ceph changes, including the fact that there is now
a v1 and v2 backend. In addition, it executes the reap-zombies script
with the python3 binary, as the basic 'python' binary does not exist
in the container.
Change-Id: Id079671f03cc5ddbe694f2aa8c9d2480dc573983
This change configures Ceph daemon pods so that
/var/lib/ceph/crash maps to a hostPath location that persists
when the pod restarts. This will allow for post-mortem examination
of crash dumps to attempt to understand why daemons have crashed.
Change-Id: I53277848f79a405b0809e0e3f19d90bbb80f3df8
The current implementation of the Ceph CSI provisioner is tied too
closely with the older Ceph RBD provisioner, which doesn't let the
deployer deploy Ceph CSI provisioner without the old RBD provisioner.
This patchset will decouple them such that they can be deployed
independently from one another.
A few other changes are needed as well:
1) The deployment/gate scripts are updated so that the old RBD and
CSI RBD provisioners are separately enabled/disabled as needed.
The original RBD provisioner is now deprecated.
2) Ceph-mon chart is updated because it had some RBD storageclass
data in values.yaml that is not needed for ceph-mon deployment.
3) Fixed a couple of bugs in job-cephfs-client-key.yaml where RBD
parameters were being used instead of cephfs parameters.
Change-Id: Icb5f78dcefa51990baf1b6d92411eb641c2ea9e2
This will ease mirroring capabilities for the docker official images.
Signed-off-by: Thiago Brito <thiago.brito@windriver.com>
Change-Id: I0f9177b0b83e4fad599ae0c3f3820202bf1d450d
Since k8s v1.11+, the annotation `service.alpha.kubernetes.io/tolerate-unready-endpoints` is deprecated. we should use Service.spec.publishNotReadyAddresses instead.
Change-Id: Ic4f82b8e78770ff29637937c4bcb9af71b53f8d3
This is to update python3 for checkObjectReplication.py script
since python2 got removed from ceph images.
Change-Id: I006a4becaeefb2a0cbef6f5d1fb56c7fc40b0170
This PS is to address security best practices concerning running
containers as a non-privileged user and disallowing privilege
escalation.
Change-Id: If4c0e9fe446091ba75d1a9818ffd3a0933285af4
This is to address zombie processes found in ceph-mon containers due
to the mon-check.sh monitoring script. With shareProcessNamespace the
/pause container will properly handle the defunct processes.
Change-Id: Ic111fd28b517f4c9b59ab23626753e9c73db1b1b
Since we introduced chart version check in gates, requirements are not
satisfied with strict check of 0.1.0
Change-Id: I15950b735b4f8566bc0018fe4f4ea9ba729235fc
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
Added chart lint in zuul CI to enhance the stability for charts.
Fixed some lint errors in the current charts.
Change-Id: I9df4024c7ccf8b3510e665fc07ba0f38871fcbdb
1) Changed the pod name and container name to pick name dynamically for
osd,mon,mgr and mds.
2) Added Init container for ceph-provisioners.
Change-Id: I3e27d51c055010cff982ddb0951d01ea8adac234
Signed-off-by: diwakar thyagaraj <diwakar.chitoor.thyagaraj@att.com>
Fix issues introduced by https://review.opendev.org/#/c/735648
with extra 'ceph-' in service_account and security context not
rendered for keyring generator containers.
Change-Id: Ie53b3407dbd7345d37c92c60a04f3badf735f6a6
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
Unrestrict octal values rule since benefits of file modes readability
exceed possible issues with yaml 1.2 adoption in future k8s versions.
These issues will be addressed when/if they occur.
Also ensure osh-infra is a required project for lint job, that matters
when running job against another project.
Change-Id: Ic5e327cf40c4b09c90738baff56419a6cef132da
Signed-off-by: Andrii Ostapenko <andrii.ostapenko@att.com>
This updates the ceph-mon chart to include the pod
security context on the pod template
This also adds the container security context to set
readOnlyRootFilesystem flag to true
Change-Id: I4c9e292eaf3d76ee80f50553d1cbc8cdc6f57cac
This commit rewrites lint job to make template linting available.
Currently yamllint is run in warning mode against all templates
rendered with default values. Duplicates detected and issues will be
addressed in subsequent commits.
Also all y*ml files are added for linting and corresponding code changes
are made. For non-templates warning rules are disabled to improve
readability. Chart and requirements yamls are also modified in the name
of consistency.
Change-Id: Ife6727c5721a00c65902340d95b7edb0a9c77365
The PS adds kubernetes tolerations for deployments from ceph-client,
ceph-mon, ceph-provisioners and ceph-rgw charts.
Change-Id: If96f5f2058fca6e145e537e95af39089f441ccbb
The current copyright refers to a non-existent group
"openstack helm authors" with often out-of-date references that
are confusing when adding a new file to the repo.
This change removes all references to this copyright by the
non-existent group and any blank lines underneath.
Change-Id: I1882738cf9757c5350a8533876fd37b5920b5235
Cephfs tests were disabled in order to merge
https://review.opendev.org/695568 due to gate failures that were
blocking it. CephFS isn't used in openstack-helm-infra, so it
wasn't required for that work. This change re-enables the cephfs
tests so we can work through any issues that are causing further
failures.
Since the the issue got fixed in 14.2.8 , upgrading all daemons to 14.2.8.
(https://tracker.ceph.com/issues/43770)
Change-Id: I376d39b7ee00ccb1ab8046b58f92b19a822272e1
An entire rack's OSDs are not being marked out after
down_out interval. This manifested itself during
resiliency testing when all interfaces were brought
down on a control plan host and the down_interval
was surpassed.
Change-Id: I6f4a69ec442c3e768feb7bd74c7d610aa9d4aa67
This PS updates the bind mounts for ceph logs directorys to be
emptydirs. This ensures we do not polute the hosts permanantly
with ceph logs, which should be directed to stdout.
Change-Id: I6d72c0864b9ecc493cd62564e0e0450d90cfcf00
Signed-off-by: Pete Birley <pete@port.direct>
Since apparmor configs are moved to value overrides, removing this.
Change-Id: Ia23c34c2ed76fceb78f68e609066139b69e09e61
Signed-off-by: diwakar thyagaraj <diwakar.chitoor.thyagaraj@att.com>
This is to redirect all the logs from daemons to stdout to avoid
accumulating large sized log files on filesystem.
NOTE: The ceph-osd daemon won't work this way and is addressed
separately in https://review.opendev.org/715295. All other Ceph
daemons are included here.
Change-Id: I3045d6e941791aba14979472fac1bca09776d3bf
This is to update ceph-mon stop script not to remove mons from
monmap as in multinode clusters three mons in the monmap are required
to handle the quorum properly.
Change-Id: I0dd643007ea0558244bfecae1d90db78828e9834
This is to update all ceph daemons startup scripts as per msgr2 protocol and
also to update v2 port for mon_host config.
This also removes setting mon_addr config since we already have mon_host config.
v1 default port: 6789
V2 default port: 3300
Change-Id: I3d95edbd89f5ac8b40a34f41c1099311cee4f875
This is to update mon_host configuration to support both v1 and v2
of messenger.
ex: mon_host = [v1:172.29.0.11:6790/0,v2:172.29.0.11:3300/0]
Change-Id: I02785ea42c07d1aecbef2cf0c32dd6a1a236659f
Signed-off-by: Pete Birley <pete@port.direct>
This is to upgrade ceph version from 14.2.5 from 14.2.7 and also
to update ceph provisioners to use latest code from quay.io
- rbd-provisioner: quay.io/external_storage/rbd-provisioner:v2.1.1-k8s1.11
- cephfs-provisioner: quay.io/external_storage/cephfs-provisioner:v2.1.0-k8s1.11
This also updates verbs for proivioner's clusterrole to support new code.
Change-Id: Ia94129574610bb5c800a6941804e58ca3aefce65