holos

mirror of https://github.com/holos-run/holos.git synced 2026-03-21 01:35:02 +00:00

Author	SHA1	Message	Date
Jeff McCune	d81e25c4e4	(#66 ) Project Certificates Provisioner cluster: This patch creates a Certificate resource in the provisioner for each host associated with the project. By default, one host is created for each stage with the short hostname set to the project name. A namespace is also created for each project for eso creds refresher to manage service accounts for SecretStore resources in the workload clusters. Workload cluster: For each env, plus one system namespace per stage: - Namespace per env - SecretStore per env - ExternalSecret per host in the env Common names for the holos project, prod stage: - holos.k1.ois.run - holos.k2.ois.run - holos.ois.run Common names for the holos project, dev stage: - holos.dev.k1.ois.run - holos.dev.k2.ois.run - holos.dev.ois.run - holos.gary.k1.ois.run - holos.gary.k2.ois.run - holos.gary.ois.run - holos.jeff.k1.ois.run - holos.jeff.k2.ois.run - holos.jeff.ois.run - holos.nate.k1.ois.run - holos.nate.k2.ois.run - holos.nate.ois.run Usage: holos render --cluster-name=provisioner \ ~/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/provisioner/projects/... holos render --cluster-name=k1 \ ~/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/workload/projects/... holos render --cluster-name=k2 \ ~/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/workload/projects/...	2024-03-27 20:54:51 -07:00
Jeff McCune	c4612ff5d2	(#64 ) Manage one system namespace per project This patch introduces a new BuildPlan spec.components.resources collection, which is a map version of spec.components.kubernetesObjectsList. The map version is much easier to work with and produce in CUE than the list version. The list version should be deprecated and removed prior to public release. The projects holos instance renders multiple holos components, each containing kubernetes api objects defined directly in CUE. <project>-system is intended for the ext auth proxy providers for all stages. <project>-namespaces is intended to create a namespace for each environment in the project. The intent is to expand the platform level definition of a project to include the per-stage auth proxy and per-env role bindings. Secret Store and ESO creds refresher resources will also be defined by the platform level definition of a project.	2024-03-26 12:23:01 -07:00
Jeff McCune	3c977d22fe	(#71 ) Final refactoring of example code to use BuildPlan Need to test it on all the clusters now. Will follow up with any necessary fixes.	2024-03-22 16:58:52 -07:00
Jeff McCune	e34db2b583	(#71 ) Refactor provisioner to produce a BuildPlan	2024-03-22 16:42:57 -07:00
Jeff McCune	71de57ac88	(#71 ) Refactor optional vault service to BuildPlan	2024-03-22 15:54:52 -07:00
Jeff McCune	c7cc661018	(#71 ) Refactor Zitadel components for BuildPlan ❯ holos render --cluster-name k2 ~/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/accounts/iam/zitadel/... 3:04PM INF render.go:43 rendered prod-iam-postgres version=0.60.2 status=ok action=rendered name=prod-iam-postgres 3:04PM INF render.go:43 rendered prod-iam-postgres-certs version=0.60.2 status=ok action=rendered name=prod-iam-postgres-certs 3:04PM INF render.go:43 rendered prod-iam-zitadel version=0.60.2 status=ok action=rendered name=prod-iam-zitadel	2024-03-22 15:04:43 -07:00
Jeff McCune	09f39c02fe	(#71 ) Refactor foundation/cloud/secrets components to BuildPlan	2024-03-22 13:50:34 -07:00
Jeff McCune	23c76a73e0	(#71 ) Refactor pgo components to BuildPlan	2024-03-22 13:29:38 -07:00
Jeff McCune	1cafe08237	(#71 ) Refactor prod-metal-ceph to use BuildPlan	2024-03-22 12:44:20 -07:00
Jeff McCune	45b07964ef	(#71 ) Refactor the mesh collection to use BuildPlan This patch refactors the example reference platform to use the new BuildPlan API. ``` ❯ holos render --cluster-name=k2 /home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/foundation/cloud/mesh/... 12:19PM INF render.go:43 rendered prod-mesh-cni version=0.60.2 status=ok action=rendered name=prod-mesh-cni 12:19PM INF render.go:43 rendered prod-mesh-gateway version=0.60.2 status=ok action=rendered name=prod-mesh-gateway 12:19PM INF render.go:43 rendered prod-mesh-httpbin version=0.60.2 status=ok action=rendered name=prod-mesh-httpbin 12:19PM INF render.go:43 rendered prod-mesh-ingress version=0.60.2 status=ok action=rendered name=prod-mesh-ingress 12:19PM INF render.go:43 rendered prod-mesh-istiod version=0.60.2 status=ok action=rendered name=prod-mesh-istiod 12:19PM INF render.go:43 rendered prod-mesh-istio-base version=0.60.2 status=ok action=rendered name=prod-mesh-istio-base ```	2024-03-22 12:44:20 -07:00
Jeff McCune	31280acbae	(#71 ) Add HelmChart BuildPlan support This patch refactors the #HelmChart definition to a BuildPlan.HelmCharts, which executes a collection of HelmCharts. The same behavior is preserved, helm template executes then a kustomize post processor executes. ``` ❯ holos render --cluster-name=k2 ~/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/foundation/cloud/github/arc/... --log-level=debug 9:53PM DBG config.go:150 finalized config from flags version=0.60.1 state=finalized 9:53PM DBG builder.go:108 cue: building instances version=0.60.1 9:53PM DBG builder.go:95 cue: equivalent command: cue export --out yaml -t cluster=k2 ./platforms/reference/clusters/foundation/cloud/github/arc/... version=0.60.1 9:53PM DBG builder.go:100 cue: tags [cluster=k2] version=0.60.1 9:53PM DBG builder.go:122 cue: building instance version=0.60.1 dir=/home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/foundation/cloud/github/arc 9:53PM DBG builder.go:127 cue: validating instance version=0.60.1 dir=/home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/foundation/cloud/github/arc 9:53PM DBG builder.go:131 cue: decoding holos build plan version=0.60.1 dir=/home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/foundation/cloud/github/arc 9:53PM DBG builder.go:122 cue: building instance version=0.60.1 dir=/home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/foundation/cloud/github/arc/runner 9:53PM DBG builder.go:127 cue: validating instance version=0.60.1 dir=/home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/foundation/cloud/github/arc/runner 9:53PM DBG builder.go:131 cue: decoding holos build plan version=0.60.1 dir=/home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/foundation/cloud/github/arc/runner 9:53PM DBG result.go:61 ExternalSecret/controller-manager version=0.60.1 kind=ExternalSecret name=controller-manager 9:53PM DBG builder.go:122 cue: building instance version=0.60.1 dir=/home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/foundation/cloud/github/arc/system 9:53PM DBG builder.go:127 cue: validating instance version=0.60.1 dir=/home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/foundation/cloud/github/arc/system 9:53PM DBG builder.go:131 cue: decoding holos build plan version=0.60.1 dir=/home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/foundation/cloud/github/arc/system 9:53PM DBG helm.go:95 helm: wrote values version=0.60.1 chart=oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set-controller path=/tmp/holos1163326896/values.yaml bytes=653 9:53PM DBG run.go:40 running: helm version=0.60.1 name=helm args="[template --no-hooks --include-crds --values /tmp/holos1163326896/values.yaml --namespace arc-system --kubeconfig /dev/null --version 0.8.3 gha-rs-controller /home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/foundation/cloud/github/arc/system/vendor/gha-runner-scale-set-controller]" 9:53PM DBG remove.go:15 tmp: removed version=0.60.1 path=/tmp/holos1163326896 9:53PM DBG result.go:95 wrote: /tmp/holos.kustomize3569816247/resources.yaml version=0.60.1 op=write path=/tmp/holos.kustomize3569816247/resources.yaml bytes=2019229 9:53PM DBG result.go:108 wrote: /tmp/holos.kustomize3569816247/kustomization.yaml version=0.60.1 op=write path=/tmp/holos.kustomize3569816247/kustomization.yaml bytes=94 9:53PM DBG run.go:40 running: kubectl version=0.60.1 name=kubectl args="[kustomize /tmp/holos.kustomize3569816247]" 9:53PM DBG remove.go:15 tmp: removed version=0.60.1 path=/tmp/holos.kustomize3569816247 9:53PM DBG result.go:135 out: wrote deploy/clusters/k2/components/prod-github-arc-runner/prod-github-arc-runner.gen.yaml version=0.60.1 action=write path=deploy/clusters/k2/components/prod-github-arc-runner/prod-github-arc-runner.gen.yaml status=ok 9:53PM DBG result.go:135 out: wrote deploy/clusters/k2/holos/components/prod-github-arc-runner-kustomization.gen.yaml version=0.60.1 action=write path=deploy/clusters/k2/holos/components/prod-github-arc-runner-kustomization.gen.yaml status=ok 9:53PM INF render.go:43 rendered prod-github-arc-runner version=0.60.1 status=ok action=rendered name=prod-github-arc-runner 9:53PM DBG result.go:135 out: wrote deploy/clusters/k2/components/prod-github-arc-system/prod-github-arc-system.gen.yaml version=0.60.1 action=write path=deploy/clusters/k2/components/prod-github-arc-system/prod-github-arc-system.gen.yaml status=ok 9:53PM DBG result.go:135 out: wrote deploy/clusters/k2/holos/components/prod-github-arc-system-kustomization.gen.yaml version=0.60.1 action=write path=deploy/clusters/k2/holos/components/prod-github-arc-system-kustomization.gen.yaml status=ok 9:53PM INF render.go:43 rendered prod-github-arc-system version=0.60.1 status=ok action=rendered name=prod-github-arc-system ```	2024-03-22 10:14:04 -07:00
Jeff McCune	6f0928b12c	(#71 ) Add go BuildPlan type as the CUE<->Holos API This patch establishes the BuildPlan struct as the single API contract between CUE and Holos. A BuildPlan spec contains a list of each of the support holos component types. The purpose of this data structure is to support the use case of one CUE instance generating 1 build plan that contains 0..N of each type of holos component. The need for multiple components per one CUE instance is to support the generation of a collection of N~4 flux kustomization resources per project and P~6 projects built from one CUE instance. Tested with: holos render --cluster-name=k2 ~/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/foundation/cloud/init/namespaces/... Common labels are removed because they're too tightly coupled to the model of one component per one cue instance.	2024-03-21 16:13:36 -07:00
Jeff McCune	104bda459f	(#69 ) Go Types for CUE/Holos API contract This patch refactors the go structs used to decode cue output for processing by the holos cli. For context, the purpose of the structs are to inform holos how the data from cue should be modeled and processed as a rendering pipeline that provides rendered yaml to configure kubernetes api objects. The structs share common fields in the form of the HolosComponent embedded struct. The three main holos component kinds today are: 1. KubernetesObjects - CUE outputs a nested map where each value is a single rendered api object (resource). 2. HelmChart - CUE outputs the chart name and values. Holos calls helm template to render the chart. Additional api objects may be overlaid into the rendered output. Kustomize may also optionally be called at the end of the render pipeline. 3. KustomizeBuild - CUE outputs data to construct a kustomize kustomization build. The holos component contains raw yaml files to use as kustomization resources. CUE optionally defines additional patches, common labels, etc. With the Go structs, cue may directly import the definitions to more easily keep the CUE definitions in sync with what the holos cli expects to receive. The holos component types may be imported into cue using: cue get go github.com/holos-run/holos/api/v1alpha1/...	2024-03-20 17:21:10 -07:00
Jeff McCune	bd2effa183	(#61 ) Improve ks prod-iam-zitadel robustness with flux health checks Without this patch ks/prod-iam-zitadel often gets blocked waiting for jobs that will never complete. In addition, flux should not manage the zitadel-test-connection Pod which is an unnecessary artifact of the upstream helm chart. We'd disable helm hooks, but they're necessary to create the init and setup jobs. This patch also changes the default behavior of Kustomizations from wait: true to wait: false. Waiting is expensive for the api server and slows down the reconciliation process considerably. Component authors should use ks.spec.healthChecks to target specific important resources to watch and wait for.	2024-03-15 15:56:43 -07:00
Jeff McCune	562412fbe7	(#57 ) Run gha-rs scale set only on the primary cluster This patch fixes the problem of the actions runner scale set listener pod failing every 3 seconds. See https://github.com/actions/actions-runner-controller/issues/3351 The solution is not ideal, if the primary cluster is down workflows will not execute. The primary cluster shouldn't go down though so this is the trade off. Lower log spam and resource usage by eliminating the failing pods on other clusters for lower availability if the primary cluster is not available. We could let the pods loop and if the primary is unavailable another would quickly pick up the role, but it doesn't seem worth it.	2024-03-15 13:13:25 -07:00
Jeff McCune	fd6fbe5598	(#57 ) Allow gha-rs scale set to fail on all but one clusters The effect of this patch is limited to refreshing credentials only for namespaces that exist in the local cluster. There is structure in place in the CUE code to allow for namespaces bound to specific clusters, but this is used only by the optional Vault component. This patch was an attempt to work around https://github.com/actions/actions-runner-controller/issues/3351 by deploying the runner scale sets into unique namespaces. This effort was a waste of time, only one listener pod successfully registered for a given scale set name / group combination. Because we have only one group named Default we can only have one listener pod globally for a given scale set name. Because we want our workflows to execute regardless of the availability of a single cluster, we're going to let this fail for now. The pod retries every 3 seconds. When a cluster is destroyed, another cluster will quickly register. A follow up patch will look to expand this retry behavior.	2024-03-15 12:53:16 -07:00
Jeff McCune	67472e1e1c	(#60 ) Disable flux reconciliation of deployment/zitadel on standby clusters	2024-03-14 21:58:32 -07:00
Jeff McCune	d64c3e8c66	(#58 ) Zitadel Failover RunBook	2024-03-14 15:25:38 -07:00
Jeff McCune	f344f97374	(#58 ) Restore last zitadel database backup When the cluster is provisioned, restore the most recent backup instead of a fixed point in time.	2024-03-14 11:40:17 -07:00
Jeff McCune	770088b912	(#53 ) Clean up nested if statements with &&	2024-03-13 10:35:20 -07:00
Jeff McCune	cb9b39c3ca	(#53 ) Add Vault as an optional service on the core clusters This patch migrates the vault component from [holos-infra][1] to a cue based component. Vault is optional in the reference platform, so this patch also defines an `#OptionalServices` struct to conditionally manage a service across multiple clusters in the platform. The primary use case for optional services is managing a namespace to provision and provide secrets across clusters. [1]: https://github.com/holos-run/holos-infra/tree/v0.5.0/components/core/core/vault	2024-03-12 17:18:38 -07:00
Jeff McCune	0f34b20546	(#54 ) Disable helm hooks when rendering components Pods are unnecessarily created when deploying helm based holos components and often fail. Prevent these test pods by disabling helm hooks with the `--no-hooks` flag. Closes: #54	2024-03-12 14:14:20 -07:00
Jeff McCune	0d7bbbb659	(#48 ) Disable pg spec.dataSource for standby cluster Problem: The standby cluster on k2 fails to start. A pgbackrest pod first restores the database from S3, then the pgha nodes try to replay the WAL as part of the standby initialization process. This fails because the PGDATA directory is not empty. Solution: Specify the spec.dataSource field only when the cluster is configured as a primary cluster. Result: Non-primary clusters are standby, they skip the pgbackrest job to restore from S3 and move straight to patroni replaying the WAL from S3 as part of the pgha pods. One of the two pgha pods becomes the "standby leader" and restores the WAL from S3. The other is a cascading standby and then restores the same WAL from the standby leader. After 8 minutes both pods are ready. ``` ❯ k get pods NAME READY STATUS RESTARTS AGE zitadel-pgbouncer-d9f8cffc-j469g 2/2 Running 0 11m zitadel-pgbouncer-d9f8cffc-xq29g 2/2 Running 0 11m zitadel-pgha1-27w7-0 4/4 Running 0 11m zitadel-pgha1-c5qj-0 4/4 Running 0 11m zitadel-repo-host-0 2/2 Running 0 11m ```	2024-03-11 17:56:47 -07:00
Jeff McCune	3f3e36bbe9	(#48 ) Split workload into foundation and accounts Problem: The k3 and k4 clusters are getting the Zitadel components which are really only intended for the core cluster pair. Solution: Split the workload subtree into two, named foundation and accounts. The core cluster pair gets foundation+accounts while the kX clusters get just the foundation subtree. Result: prod-zitadel-iam is no longer managed on k3 and k4	2024-03-11 15:20:35 -07:00
Jeff McCune	9f41478d33	(#48 ) Restore from Monday morning after Gary and Nate registered Set the restore point to time="2024-03-11T17:08:58Z" level=info msg="crunchy-pgbackrest ends" which is just after Gary and Nate registered and were granted the cluster-admin role.	2024-03-11 10:18:45 -07:00
Jeff McCune	7b215bb8f1	(#48 ) Custom PGO Certs for Zitadel The [Streaming Standby][standby] architecture requires custom tls certs for two clusters in two regions to connect to each other. This patch manages the custom certs following the configuration described in the article [Using Cert Manager to Deploy TLS for Postgres on Kubernetes][article]. NOTE: One thing not mentioned anywhere in the crunchy documentation is how custom tls certs work with pgbouncer. The pgbouncer service uses a tls certificate issued by the pgo root cert, not by the custom certificate authority. For this reason, we use kustomize to patch the zitadel Deployment and the zitadel-init and zitadel-setup Jobs. The patch projects the ca bundle from the `zitadel-pgbouncer` secret into the zitadel pods at /pgbouncer/ca.crt [standby]: https://access.crunchydata.com/documentation/postgres-operator/latest/architecture/disaster-recovery#streaming-standby-with-an-external-repo [article]: https://www.crunchydata.com/blog/using-cert-manager-to-deploy-tls-for-postgres-on-kubernetes	2024-03-10 22:54:06 -07:00
Jeff McCune	78cec76a96	(#48 ) Restore ZITADEL from point in time full backup A full backup was taken using: ``` kubectl annotate postgrescluster zitadel postgres-operator.crunchydata.com/pgbackrest-backup="$(date)" ``` And completed with: ``` ❯ k logs -f zitadel-backup-5r6v-v5jnm time="2024-03-10T21:52:15Z" level=info msg="crunchy-pgbackrest starts" time="2024-03-10T21:52:15Z" level=info msg="debug flag set to false" time="2024-03-10T21:52:15Z" level=info msg="backrest backup command requested" time="2024-03-10T21:52:15Z" level=info msg="command to execute is [pgbackrest backup --stanza=db --repo=2 --type=full]" time="2024-03-10T21:55:18Z" level=info msg="crunchy-pgbackrest ends" ``` This patch verifies the point in time backup is robust in the face of the following operations: 1. pg cluster zitadel was deleted (whole namespace emptied) 2. pg cluster zitadel was re-created _without_ a `dataSource` 3. pgo initailized a new database and backed up the blank database to S3. 4. pg cluster zitadel was deleted again. 5. pg cluster zitadel was re-created with `dataSource` `options: ["--type=time", "--target=\"2024-03-10 21:56:00+00\""]` (Just after the full backup completed) 6. Restore completed successfully. 7. Applied the holos zitadel component. 8. Zitadel came up successfully and user login worked as expected. - [x] Perform an in place [restore][restore] from [s3][bucket]. - [x] Set repo1-retention-full to clear warning [restore]: https://access.crunchydata.com/documentation/postgres-operator/latest/tutorials/backups-disaster-recovery/disaster-recovery#restore-properties [bucket]: https://access.crunchydata.com/documentation/postgres-operator/latest/tutorials/backups-disaster-recovery/disaster-recovery#cloud-based-data-source	2024-03-10 17:42:54 -07:00
Jeff McCune	0e98ad2ecb	(#48 ) Zitadel Backups This patch configures backups suitable to support the [Streaming Standby with an External Repo][0] architecture. - [x] PGO [Multiple Backup Repositories][1] to k8s pv and s3. - [x] [Encryption][2] of backups to S3. - [x] [Remove SUPERUSER][3] role from zitadel-admin pg user to work with pgbouncer. Resolves zitadel-init job failure. - [x] Take a [Manual Backup][5] [0]: https://access.crunchydata.com/documentation/postgres-operator/latest/architecture/disaster-recovery#streaming-standby-with-an-external-repo [1]: https://access.crunchydata.com/documentation/postgres-operator/latest/tutorials/backups-disaster-recovery/backups#set-up-multiple-backup-repositories [2]: https://access.crunchydata.com/documentation/postgres-operator/latest/tutorials/backups-disaster-recovery/backups#encryption [3]: https://github.com/CrunchyData/postgres-operator/issues/3095#issuecomment-1904712211 [4]: https://access.crunchydata.com/documentation/postgres-operator/latest/tutorials/backups-disaster-recovery/disaster-recovery#streaming-standby-with-an-external-repo [5]: https://access.crunchydata.com/documentation/postgres-operator/latest/tutorials/backups-disaster-recovery/backup-management#taking-a-one-off-backup	2024-03-10 16:38:56 -07:00
Jeff McCune	ac03f64724	(#45 ) Configure ZITADEL to use pgbouncer	2024-03-09 09:44:33 -08:00
Jeff McCune	bea4468972	(#42 ) Remove cert manager db ca components Simpler to let postgres manage the certs. TLS is in verify-full mode with the pgo configured certs.	2024-03-08 21:34:26 -08:00
Jeff McCune	224adffa15	(#42 ) Add holos components for zitadel with postgres To establish the canonical https://login.ois.run identity issuer on the core cluster pair. Custom resources for PGO have been imported with: timoni mod vendor crds -f deploy/clusters/core2/components/prod-pgo-crds/prod-pgo-crds.gen.yaml Note, the zitadel tls connection took some considerable effort to get working. We intentionally use pgo issued certs to reduce the toil of managing certs issued by cert manager. The default tls configuration of pgo is pretty good with verify full enabled.	2024-03-08 21:29:25 -08:00
Jeff McCune	b4d34ffdbc	(#42 ) Fix incorrect ceph pool for core2 cluster The core2 cluster cannot provision pvcs because it's using the k8s-dev pool when it has credentials valid only for the k8s-prod pool. This patch adds an entry to the platform cluster map to configure the pool for each cluster, with a default of k8s-dev.	2024-03-08 13:14:27 -08:00
Jeff McCune	a85db9cf5e	(#42 ) Add KustomizeBuild holos component type to install pgo PGO uses plain yaml and kustomize as the recommended installation method. Holos supports upstream by adding a new PlainFiles component kind, which simply copies files into place and lets kustomize handle the generation of the api objects. Cue is responsible for very little in this kind of component, basically allowing overlay resources if needed and deferring everything else to the holos cli. The holos cli in turn is responsible for executing kubectl kustomize build on the input directory to produce the rendered output, then writes the rendered output into place.	2024-03-08 11:27:42 -08:00
Jeff McCune	4501ceec05	(#40 ) Use baseline security context for GitHub arc Without this patch the arc controller fails to create a listener. The template for the listener doesn't appear to be configurable from the chart. Could patch the listener pod template with kustomize, do this as a follow up feature. With this patch we get the expected two pods in the runner system namespace: ``` ❯ k get pods NAME READY STATUS RESTARTS AGE gha-rs-7db9c9f7-listener 1/1 Running 0 43s gha-rs-controller-56bb9c77d9-6tjch 1/1 Running 0 8s ```	2024-03-07 22:37:50 -08:00
Jeff McCune	4183fdfd42	(#40 ) Note the helm release name is the installation name Which is the value of the `runs-on` field in workflows.	2024-03-07 22:37:50 -08:00
Jeff McCune	2595793019	(#40 ) Do not force the namespace with kustomize To avoid confining the custom resource definitions to a namespace.	2024-03-07 22:37:50 -08:00
Jeff McCune	aa3d1914b1	(#40 ) Manage the actions runner scale sets	2024-03-07 22:37:49 -08:00
Jeff McCune	679ddbb6bf	(#40 ) Use Restricted pod security for arc runners Might as well put the restriction in place before deploying the runners to see what breaks.	2024-03-07 22:37:49 -08:00
Jeff McCune	b1d7d07a04	(#40 ) Add field for helm chart release name The resource names for the arc controller are too long: ❯ k get pods -n arc-systems NAME READY STATUS RESTARTS AGE gha-runner-scale-set-controller-gha-rs-controller-6bdf45bd6jx5n 1/1 Running 0 59m Solve the problem by allowing components to set the release name to `gha-rs-controller` which requires an additional field from the cue code to differentiate from the chart name.	2024-03-07 20:40:31 -08:00
Jeff McCune	5f58263232	(#40 ) Create arc namespaces Named after the upstream install guide, though arc-systems makes me twitch for arc-system.	2024-03-07 20:37:35 -08:00
Jeff McCune	b6bdd072f7	(#40 ) Include crds when running helm template Might need to make this a configurable option, but for now just always do it.	2024-03-07 20:37:35 -08:00
Jeff McCune	509f2141ac	(#40 ) Actions Runner Controller This patch adds support for helm oci images which are used by the gha-runner-scale-set-controller. For example, arc is installed normally with: ``` NAMESPACE="arc-systems" helm install arc \ --namespace "${NAMESPACE}" \ --create-namespace \ oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set-controller ``` This patch caches the oci image in the same way as the repository based method. Refer to: https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/quickstart-for-actions-runner-controller	2024-03-07 20:37:35 -08:00
Jeff McCune	4c2bc34d58	(#32 ) SecretStore Component Separate the SecretStore resources from the namespaces component because it creates a deadlock. The secretstore crds don't get applied until the eso component is managed. The namespaces component should have nothing but core api objects, no custom resources.	2024-03-07 16:01:22 -08:00
Jeff McCune	340715f76c	(#36 ) Provide certs to Cockroach DB and Zitadel with ExternalSecrets This patch switches CockroachDB to use certs provided by ExternalSecrets instead of managing Certificate resources in-cluster from the upstream helm chart. This paves the way for multi-cluster replication by moving certificates outside of the lifecycle of the workload cluster cockroach db operates within. Closes: #36	2024-03-06 10:38:47 -08:00
Jeff McCune	64ffacfc7a	(#36 ) Add Cockroach Issuer for Zitadel to provisioner cluster Issuing mtls certs for cockroach db moves to the provisioner cluster so we can more easily support cross cluster replication in the future. crdb certs will be synced same as public tls certs, using ExternalSecret resources.	2024-03-06 09:36:20 -08:00
Jeff McCune	fd5a2fdbc1	(#36 ) Sync certs as ExternalSecrets from workload clusters This patch replaces the httpbin and login cert on the workload clusters with an ExternalSecret to sync the tls cert from the provisioner cluster.	2024-03-05 17:05:10 -08:00
Jeff McCune	eb3e272612	(#36 ) Dynamically generate cluster certs from Platform spec Each cluster should be more or less identical, configure certs from the dynamic list of platform clusters.	2024-03-05 16:44:35 -08:00
Jeff McCune	2b3b5a4887	(#36 ) Issue login and httpbin certs This patch uses cert manager in the provisioner cluster to provision tls certs for https://login.example.com and https://httpbin.k2.example.com The certs are not yet synced to the clusters. Next step is to replace the Certificate resources with ExternalSecret resources, then remove cert manager from the workload clusters.	2024-03-05 14:27:37 -08:00
Jeff McCune	7426e8f867	(#36 ) Move cert-manager to the provisioner cluster This patch moves certificate management to the provisioner cluster to centralize all secrets into the highly secured cluster. This change also simplifies the architecture in a number of ways: 1. Certificate lives are now completely independent of cluster lifecycle. 2. Remove the need for bi-directional sync to save cert secrets. 3. Workload clusters no longer need access to DNS.	2024-03-05 12:51:58 -08:00
Jeff McCune	b3f682453d	(#31 ) Inject istio sidecar into Deployment zitadel using Kustomize Multiple holos components rely on kustomize to modify the output of the upstream helm chart, for example patching a Deployment to inject the istio sidecar. The new holos cue based component system did not support running kustomize after helm template. This patch adds the kustomize execution if two fields are defined in the helm chart kind of cue output. The API spec is pretty loose in this patch but I'm proceeding for expedience and to inform the final API with more use cases as more components are migrated to cue.	2024-03-05 09:56:39 -08:00

1 2 3

107 Commits