Commit Graph

107 Commits

Author SHA1 Message Date
Jeff McCune
d81e25c4e4 (#66) Project Certificates
Provisioner cluster:

This patch creates a Certificate resource in the provisioner for each
host associated with the project.  By default, one host is created for
each stage with the short hostname set to the project name.

A namespace is also created for each project for eso creds refresher to
manage service accounts for SecretStore resources in the workload
clusters.

Workload cluster:

For each env, plus one system namespace per stage:

 - Namespace per env
 - SecretStore per env
 - ExternalSecret per host in the env

Common names for the holos project, prod stage:

- holos.k1.ois.run
- holos.k2.ois.run
- holos.ois.run

Common names for the holos project, dev stage:

- holos.dev.k1.ois.run
- holos.dev.k2.ois.run
- holos.dev.ois.run
- holos.gary.k1.ois.run
- holos.gary.k2.ois.run
- holos.gary.ois.run
- holos.jeff.k1.ois.run
- holos.jeff.k2.ois.run
- holos.jeff.ois.run
- holos.nate.k1.ois.run
- holos.nate.k2.ois.run
- holos.nate.ois.run

Usage:

    holos render --cluster-name=provisioner \
      ~/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/provisioner/projects/...
    holos render --cluster-name=k1 \
      ~/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/workload/projects/...
    holos render --cluster-name=k2 \
      ~/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/workload/projects/...
2024-03-27 20:54:51 -07:00
Jeff McCune
c4612ff5d2 (#64) Manage one system namespace per project
This patch introduces a new BuildPlan spec.components.resources
collection, which is a map version of
spec.components.kubernetesObjectsList.  The map version is much easier
to work with and produce in CUE than the list version.

The list version should be deprecated and removed prior to public
release.

The projects holos instance renders multiple holos components, each
containing kubernetes api objects defined directly in CUE.

<project>-system is intended for the ext auth proxy providers for all
stages.

<project>-namespaces is intended to create a namespace for each
environment in the project.

The intent is to expand the platform level definition of a project to
include the per-stage auth proxy and per-env role bindings.  Secret
Store and ESO creds refresher resources will also be defined by the
platform level definition of a project.
2024-03-26 12:23:01 -07:00
Jeff McCune
3c977d22fe (#71) Final refactoring of example code to use BuildPlan
Need to test it on all the clusters now.  Will follow up with any
necessary fixes.
2024-03-22 16:58:52 -07:00
Jeff McCune
e34db2b583 (#71) Refactor provisioner to produce a BuildPlan 2024-03-22 16:42:57 -07:00
Jeff McCune
71de57ac88 (#71) Refactor optional vault service to BuildPlan 2024-03-22 15:54:52 -07:00
Jeff McCune
c7cc661018 (#71) Refactor Zitadel components for BuildPlan
❯ holos render --cluster-name k2  ~/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/accounts/iam/zitadel/...
3:04PM INF render.go:43 rendered prod-iam-postgres version=0.60.2 status=ok action=rendered name=prod-iam-postgres
3:04PM INF render.go:43 rendered prod-iam-postgres-certs version=0.60.2 status=ok action=rendered name=prod-iam-postgres-certs
3:04PM INF render.go:43 rendered prod-iam-zitadel version=0.60.2 status=ok action=rendered name=prod-iam-zitadel
2024-03-22 15:04:43 -07:00
Jeff McCune
09f39c02fe (#71) Refactor foundation/cloud/secrets components to BuildPlan 2024-03-22 13:50:34 -07:00
Jeff McCune
23c76a73e0 (#71) Refactor pgo components to BuildPlan 2024-03-22 13:29:38 -07:00
Jeff McCune
1cafe08237 (#71) Refactor prod-metal-ceph to use BuildPlan 2024-03-22 12:44:20 -07:00
Jeff McCune
45b07964ef (#71) Refactor the mesh collection to use BuildPlan
This patch refactors the example reference platform to use the new
BuildPlan API.

```
❯ holos render --cluster-name=k2 /home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/foundation/cloud/mesh/...
12:19PM INF render.go:43 rendered prod-mesh-cni version=0.60.2 status=ok action=rendered name=prod-mesh-cni
12:19PM INF render.go:43 rendered prod-mesh-gateway version=0.60.2 status=ok action=rendered name=prod-mesh-gateway
12:19PM INF render.go:43 rendered prod-mesh-httpbin version=0.60.2 status=ok action=rendered name=prod-mesh-httpbin
12:19PM INF render.go:43 rendered prod-mesh-ingress version=0.60.2 status=ok action=rendered name=prod-mesh-ingress
12:19PM INF render.go:43 rendered prod-mesh-istiod version=0.60.2 status=ok action=rendered name=prod-mesh-istiod
12:19PM INF render.go:43 rendered prod-mesh-istio-base version=0.60.2 status=ok action=rendered name=prod-mesh-istio-base
```
2024-03-22 12:44:20 -07:00
Jeff McCune
31280acbae (#71) Add HelmChart BuildPlan support
This patch refactors the #HelmChart definition to a BuildPlan.HelmCharts,
which executes a collection of HelmCharts.  The same behavior is
preserved, helm template executes then a kustomize post processor
executes.

```
❯ holos render --cluster-name=k2 ~/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/foundation/cloud/github/arc/... --log-level=debug
9:53PM DBG config.go:150 finalized config from flags version=0.60.1 state=finalized
9:53PM DBG builder.go:108 cue: building instances version=0.60.1
9:53PM DBG builder.go:95 cue: equivalent command: cue export --out yaml -t cluster=k2 ./platforms/reference/clusters/foundation/cloud/github/arc/... version=0.60.1
9:53PM DBG builder.go:100 cue: tags [cluster=k2] version=0.60.1
9:53PM DBG builder.go:122 cue: building instance version=0.60.1 dir=/home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/foundation/cloud/github/arc
9:53PM DBG builder.go:127 cue: validating instance version=0.60.1 dir=/home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/foundation/cloud/github/arc
9:53PM DBG builder.go:131 cue: decoding holos build plan version=0.60.1 dir=/home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/foundation/cloud/github/arc
9:53PM DBG builder.go:122 cue: building instance version=0.60.1 dir=/home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/foundation/cloud/github/arc/runner
9:53PM DBG builder.go:127 cue: validating instance version=0.60.1 dir=/home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/foundation/cloud/github/arc/runner
9:53PM DBG builder.go:131 cue: decoding holos build plan version=0.60.1 dir=/home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/foundation/cloud/github/arc/runner
9:53PM DBG result.go:61 ExternalSecret/controller-manager version=0.60.1 kind=ExternalSecret name=controller-manager
9:53PM DBG builder.go:122 cue: building instance version=0.60.1 dir=/home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/foundation/cloud/github/arc/system
9:53PM DBG builder.go:127 cue: validating instance version=0.60.1 dir=/home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/foundation/cloud/github/arc/system
9:53PM DBG builder.go:131 cue: decoding holos build plan version=0.60.1 dir=/home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/foundation/cloud/github/arc/system
9:53PM DBG helm.go:95 helm: wrote values version=0.60.1 chart=oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set-controller path=/tmp/holos1163326896/values.yaml bytes=653
9:53PM DBG run.go:40 running: helm version=0.60.1 name=helm args="[template --no-hooks --include-crds --values /tmp/holos1163326896/values.yaml --namespace arc-system --kubeconfig /dev/null --version 0.8.3 gha-rs-controller /home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/foundation/cloud/github/arc/system/vendor/gha-runner-scale-set-controller]"
9:53PM DBG remove.go:15 tmp: removed version=0.60.1 path=/tmp/holos1163326896
9:53PM DBG result.go:95 wrote: /tmp/holos.kustomize3569816247/resources.yaml version=0.60.1 op=write path=/tmp/holos.kustomize3569816247/resources.yaml bytes=2019229
9:53PM DBG result.go:108 wrote: /tmp/holos.kustomize3569816247/kustomization.yaml version=0.60.1 op=write path=/tmp/holos.kustomize3569816247/kustomization.yaml bytes=94
9:53PM DBG run.go:40 running: kubectl version=0.60.1 name=kubectl args="[kustomize /tmp/holos.kustomize3569816247]"
9:53PM DBG remove.go:15 tmp: removed version=0.60.1 path=/tmp/holos.kustomize3569816247
9:53PM DBG result.go:135 out: wrote deploy/clusters/k2/components/prod-github-arc-runner/prod-github-arc-runner.gen.yaml version=0.60.1 action=write path=deploy/clusters/k2/components/prod-github-arc-runner/prod-github-arc-runner.gen.yaml status=ok
9:53PM DBG result.go:135 out: wrote deploy/clusters/k2/holos/components/prod-github-arc-runner-kustomization.gen.yaml version=0.60.1 action=write path=deploy/clusters/k2/holos/components/prod-github-arc-runner-kustomization.gen.yaml status=ok
9:53PM INF render.go:43 rendered prod-github-arc-runner version=0.60.1 status=ok action=rendered name=prod-github-arc-runner
9:53PM DBG result.go:135 out: wrote deploy/clusters/k2/components/prod-github-arc-system/prod-github-arc-system.gen.yaml version=0.60.1 action=write path=deploy/clusters/k2/components/prod-github-arc-system/prod-github-arc-system.gen.yaml status=ok
9:53PM DBG result.go:135 out: wrote deploy/clusters/k2/holos/components/prod-github-arc-system-kustomization.gen.yaml version=0.60.1 action=write path=deploy/clusters/k2/holos/components/prod-github-arc-system-kustomization.gen.yaml status=ok
9:53PM INF render.go:43 rendered prod-github-arc-system version=0.60.1 status=ok action=rendered name=prod-github-arc-system
```
2024-03-22 10:14:04 -07:00
Jeff McCune
6f0928b12c (#71) Add go BuildPlan type as the CUE<->Holos API
This patch establishes the BuildPlan struct as the single API contract
between CUE and Holos.  A BuildPlan spec contains a list of each of the
support holos component types.

The purpose of this data structure is to support the use case of one CUE
instance generating 1 build plan that contains 0..N of each type of
holos component.

The need for multiple components per one CUE instance is to support the
generation of a collection of N~4 flux kustomization resources per
project and P~6 projects built from one CUE instance.

Tested with:

    holos render --cluster-name=k2 ~/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/foundation/cloud/init/namespaces/...

Common labels are removed because they're too tightly coupled to the
model of one component per one cue instance.
2024-03-21 16:13:36 -07:00
Jeff McCune
104bda459f (#69) Go Types for CUE/Holos API contract
This patch refactors the go structs used to decode cue output for
processing by the holos cli.  For context, the purpose of the structs
are to inform holos how the data from cue should be modeled and
processed as a rendering pipeline that provides rendered yaml to
configure kubernetes api objects.

The structs share common fields in the form of the HolosComponent
embedded struct.  The three main holos component kinds today are:

 1. KubernetesObjects - CUE outputs a nested map where each value is a
    single rendered api object (resource).
 2. HelmChart - CUE outputs the chart name and values.  Holos calls helm
    template to render the chart.  Additional api objects may be
    overlaid into the rendered output.  Kustomize may also optionally be
    called at the end of the render pipeline.
 3. KustomizeBuild - CUE outputs data to construct a kustomize
    kustomization build.  The holos component contains raw yaml files to
    use as kustomization resources.  CUE optionally defines additional
    patches, common labels, etc.

With the Go structs, cue may directly import the definitions to more
easily keep the CUE definitions in sync with what the holos cli expects
to receive.

The holos component types may be imported into cue using:

    cue get go github.com/holos-run/holos/api/v1alpha1/...
2024-03-20 17:21:10 -07:00
Jeff McCune
bd2effa183 (#61) Improve ks prod-iam-zitadel robustness with flux health checks
Without this patch ks/prod-iam-zitadel often gets blocked waiting for
jobs that will never complete.  In addition, flux should not manage the
zitadel-test-connection Pod which is an unnecessary artifact of the
upstream helm chart.

We'd disable helm hooks, but they're necessary to create the init and
setup jobs.

This patch also changes the default behavior of Kustomizations from
wait: true to wait: false.  Waiting is expensive for the api server and
slows down the reconciliation process considerably.

Component authors should use ks.spec.healthChecks to target specific
important resources to watch and wait for.
2024-03-15 15:56:43 -07:00
Jeff McCune
562412fbe7 (#57) Run gha-rs scale set only on the primary cluster
This patch fixes the problem of the actions runner scale set listener
pod failing every 3 seconds.  See
https://github.com/actions/actions-runner-controller/issues/3351

The solution is not ideal, if the primary cluster is down workflows will
not execute.  The primary cluster shouldn't go down though so this is
the trade off.  Lower log spam and resource usage by eliminating the
failing pods on other clusters for lower availability if the primary
cluster is not available.

We could let the pods loop and if the primary is unavailable another
would quickly pick up the role, but it doesn't seem worth it.
2024-03-15 13:13:25 -07:00
Jeff McCune
fd6fbe5598 (#57) Allow gha-rs scale set to fail on all but one clusters
The effect of this patch is limited to refreshing credentials only for
namespaces that exist in the local cluster.  There is structure in place
in the CUE code to allow for namespaces bound to specific clusters, but
this is used only by the optional Vault component.

This patch was an attempt to work around
https://github.com/actions/actions-runner-controller/issues/3351 by
deploying the runner scale sets into unique namespaces.

This effort was a waste of time, only one listener pod successfully
registered for a given scale set name / group combination.

Because we have only one group named Default we can only have one
listener pod globally for a given scale set name.

Because we want our workflows to execute regardless of the availability
of a single cluster, we're going to let this fail for now.  The pod
retries every 3 seconds.  When a cluster is destroyed, another cluster
will quickly register.

A follow up patch will look to expand this retry behavior.
2024-03-15 12:53:16 -07:00
Jeff McCune
67472e1e1c (#60) Disable flux reconciliation of deployment/zitadel on standby clusters 2024-03-14 21:58:32 -07:00
Jeff McCune
d64c3e8c66 (#58) Zitadel Failover RunBook 2024-03-14 15:25:38 -07:00
Jeff McCune
f344f97374 (#58) Restore last zitadel database backup
When the cluster is provisioned, restore the most recent backup instead
of a fixed point in time.
2024-03-14 11:40:17 -07:00
Jeff McCune
770088b912 (#53) Clean up nested if statements with && 2024-03-13 10:35:20 -07:00
Jeff McCune
cb9b39c3ca (#53) Add Vault as an optional service on the core clusters
This patch migrates the vault component from [holos-infra][1] to a cue
based component.  Vault is optional in the reference platform, so this
patch also defines an `#OptionalServices` struct to conditionally manage
a service across multiple clusters in the platform.

The primary use case for optional services is managing a namespace to
provision and provide secrets across clusters.

[1]: https://github.com/holos-run/holos-infra/tree/v0.5.0/components/core/core/vault
2024-03-12 17:18:38 -07:00
Jeff McCune
0f34b20546 (#54) Disable helm hooks when rendering components
Pods are unnecessarily created when deploying helm based holos
components and often fail.  Prevent these test pods by disabling helm
hooks with the `--no-hooks` flag.

Closes: #54
2024-03-12 14:14:20 -07:00
Jeff McCune
0d7bbbb659 (#48) Disable pg spec.dataSource for standby cluster
Problem:
The standby cluster on k2 fails to start.  A pgbackrest pod first
restores the database from S3, then the pgha nodes try to replay the WAL
as part of the standby initialization process.  This fails because the
PGDATA directory is not empty.

Solution:
Specify the spec.dataSource field only when the cluster is configured as
a primary cluster.

Result:
Non-primary clusters are standby, they skip the pgbackrest job to
restore from S3 and move straight to patroni replaying the WAL from S3
as part of the pgha pods.

One of the two pgha pods becomes the "standby leader" and restores the
WAL from S3.  The other is a cascading standby and then restores the
same WAL from the standby leader.

After 8 minutes both pods are ready.

```
❯ k get pods
NAME                               READY   STATUS    RESTARTS   AGE
zitadel-pgbouncer-d9f8cffc-j469g   2/2     Running   0          11m
zitadel-pgbouncer-d9f8cffc-xq29g   2/2     Running   0          11m
zitadel-pgha1-27w7-0               4/4     Running   0          11m
zitadel-pgha1-c5qj-0               4/4     Running   0          11m
zitadel-repo-host-0                2/2     Running   0          11m
```
2024-03-11 17:56:47 -07:00
Jeff McCune
3f3e36bbe9 (#48) Split workload into foundation and accounts
Problem:
The k3 and k4 clusters are getting the Zitadel components which are
really only intended for the core cluster pair.

Solution:
Split the workload subtree into two, named foundation and accounts.  The
core cluster pair gets foundation+accounts while the kX clusters get
just the foundation subtree.

Result:
prod-zitadel-iam is no longer managed on k3 and k4
2024-03-11 15:20:35 -07:00
Jeff McCune
9f41478d33 (#48) Restore from Monday morning after Gary and Nate registered
Set the restore point to time="2024-03-11T17:08:58Z" level=info
msg="crunchy-pgbackrest ends" which is just after Gary and Nate
registered and were granted the cluster-admin role.
2024-03-11 10:18:45 -07:00
Jeff McCune
7b215bb8f1 (#48) Custom PGO Certs for Zitadel
The [Streaming Standby][standby] architecture requires custom tls certs
for two clusters in two regions to connect to each other.

This patch manages the custom certs following the configuration
described in the article [Using Cert Manager to Deploy TLS for Postgres
on Kubernetes][article].

NOTE: One thing not mentioned anywhere in the crunchy documentation is
how custom tls certs work with pgbouncer.  The pgbouncer service uses a
tls certificate issued by the pgo root cert, not by the custom
certificate authority.

For this reason, we use kustomize to patch the zitadel Deployment and
the zitadel-init and zitadel-setup Jobs.  The patch projects the ca
bundle from the `zitadel-pgbouncer` secret into the zitadel pods at
/pgbouncer/ca.crt

[standby]: https://access.crunchydata.com/documentation/postgres-operator/latest/architecture/disaster-recovery#streaming-standby-with-an-external-repo
[article]: https://www.crunchydata.com/blog/using-cert-manager-to-deploy-tls-for-postgres-on-kubernetes
2024-03-10 22:54:06 -07:00
Jeff McCune
78cec76a96 (#48) Restore ZITADEL from point in time full backup
A full backup was taken using:

```
kubectl annotate postgrescluster zitadel postgres-operator.crunchydata.com/pgbackrest-backup="$(date)"
```

And completed with:

```
❯ k logs -f zitadel-backup-5r6v-v5jnm
time="2024-03-10T21:52:15Z" level=info msg="crunchy-pgbackrest starts"
time="2024-03-10T21:52:15Z" level=info msg="debug flag set to false"
time="2024-03-10T21:52:15Z" level=info msg="backrest backup command requested"
time="2024-03-10T21:52:15Z" level=info msg="command to execute is [pgbackrest backup --stanza=db --repo=2 --type=full]"
time="2024-03-10T21:55:18Z" level=info msg="crunchy-pgbackrest ends"
```

This patch verifies the point in time backup is robust in the face of
the following operations:

1. pg cluster zitadel was deleted (whole namespace emptied)
2. pg cluster zitadel was re-created _without_ a `dataSource`
3. pgo initailized a new database and backed up the blank database to
   S3.
4. pg cluster zitadel was deleted again.
5. pg cluster zitadel was re-created with `dataSource` `options: ["--type=time", "--target=\"2024-03-10 21:56:00+00\""]` (Just after the full backup completed)
6. Restore completed successfully.
7. Applied the holos zitadel component.
8. Zitadel came up successfully and user login worked as expected.

- [x] Perform an in place [restore][restore] from [s3][bucket].
- [x] Set repo1-retention-full to clear warning

[restore]: https://access.crunchydata.com/documentation/postgres-operator/latest/tutorials/backups-disaster-recovery/disaster-recovery#restore-properties
[bucket]: https://access.crunchydata.com/documentation/postgres-operator/latest/tutorials/backups-disaster-recovery/disaster-recovery#cloud-based-data-source
2024-03-10 17:42:54 -07:00
Jeff McCune
0e98ad2ecb (#48) Zitadel Backups
This patch configures backups suitable to support the [Streaming Standby
with an External Repo][0] architecture.

- [x] PGO [Multiple Backup Repositories][1] to k8s pv and s3.
- [x] [Encryption][2] of backups to S3.
- [x] [Remove SUPERUSER][3] role from zitadel-admin pg user to work with pgbouncer.  Resolves zitadel-init job failure.
- [x] Take a [Manual Backup][5]

[0]: https://access.crunchydata.com/documentation/postgres-operator/latest/architecture/disaster-recovery#streaming-standby-with-an-external-repo
[1]: https://access.crunchydata.com/documentation/postgres-operator/latest/tutorials/backups-disaster-recovery/backups#set-up-multiple-backup-repositories
[2]: https://access.crunchydata.com/documentation/postgres-operator/latest/tutorials/backups-disaster-recovery/backups#encryption
[3]: https://github.com/CrunchyData/postgres-operator/issues/3095#issuecomment-1904712211
[4]: https://access.crunchydata.com/documentation/postgres-operator/latest/tutorials/backups-disaster-recovery/disaster-recovery#streaming-standby-with-an-external-repo
[5]: https://access.crunchydata.com/documentation/postgres-operator/latest/tutorials/backups-disaster-recovery/backup-management#taking-a-one-off-backup
2024-03-10 16:38:56 -07:00
Jeff McCune
ac03f64724 (#45) Configure ZITADEL to use pgbouncer 2024-03-09 09:44:33 -08:00
Jeff McCune
bea4468972 (#42) Remove cert manager db ca components
Simpler to let postgres manage the certs.  TLS is in verify-full mode
with the pgo configured certs.
2024-03-08 21:34:26 -08:00
Jeff McCune
224adffa15 (#42) Add holos components for zitadel with postgres
To establish the canonical https://login.ois.run identity issuer on the
core cluster pair.

Custom resources for PGO have been imported with:

    timoni mod vendor crds -f deploy/clusters/core2/components/prod-pgo-crds/prod-pgo-crds.gen.yaml

Note, the zitadel tls connection took some considerable effort to get
working.  We intentionally use pgo issued certs to reduce the toil of
managing certs issued by cert manager.

The default tls configuration of pgo is pretty good with verify full
enabled.
2024-03-08 21:29:25 -08:00
Jeff McCune
b4d34ffdbc (#42) Fix incorrect ceph pool for core2 cluster
The core2 cluster cannot provision pvcs because it's using the k8s-dev
pool when it has credentials valid only for the k8s-prod pool.

This patch adds an entry to the platform cluster map to configure the
pool for each cluster, with a default of k8s-dev.
2024-03-08 13:14:27 -08:00
Jeff McCune
a85db9cf5e (#42) Add KustomizeBuild holos component type to install pgo
PGO uses plain yaml and kustomize as the recommended installation
method.  Holos supports upstream by adding a new PlainFiles component
kind, which simply copies files into place and lets kustomize handle the
generation of the api objects.

Cue is responsible for very little in this kind of component, basically
allowing overlay resources if needed and deferring everything else to
the holos cli.

The holos cli in turn is responsible for executing kubectl kustomize
build on the input directory to produce the rendered output, then writes
the rendered output into place.
2024-03-08 11:27:42 -08:00
Jeff McCune
4501ceec05 (#40) Use baseline security context for GitHub arc
Without this patch the arc controller fails to create a listener.  The
template for the listener doesn't appear to be configurable from the
chart.

Could patch the listener pod template with kustomize, do this as a
follow up feature.

With this patch we get the expected two pods in the runner system
namespace:

```
❯ k get pods
NAME                                 READY   STATUS    RESTARTS   AGE
gha-rs-7db9c9f7-listener             1/1     Running   0          43s
gha-rs-controller-56bb9c77d9-6tjch   1/1     Running   0          8s
```
2024-03-07 22:37:50 -08:00
Jeff McCune
4183fdfd42 (#40) Note the helm release name is the installation name
Which is the value of the `runs-on` field in workflows.
2024-03-07 22:37:50 -08:00
Jeff McCune
2595793019 (#40) Do not force the namespace with kustomize
To avoid confining the custom resource definitions to a namespace.
2024-03-07 22:37:50 -08:00
Jeff McCune
aa3d1914b1 (#40) Manage the actions runner scale sets 2024-03-07 22:37:49 -08:00
Jeff McCune
679ddbb6bf (#40) Use Restricted pod security for arc runners
Might as well put the restriction in place before deploying the runners
to see what breaks.
2024-03-07 22:37:49 -08:00
Jeff McCune
b1d7d07a04 (#40) Add field for helm chart release name
The resource names for the arc controller are too long:

❯ k get pods -n arc-systems
NAME                                                              READY   STATUS    RESTARTS   AGE
gha-runner-scale-set-controller-gha-rs-controller-6bdf45bd6jx5n   1/1     Running   0          59m

Solve the problem by allowing components to set the release name to
`gha-rs-controller` which requires an additional field from the cue code
to differentiate from the chart name.
2024-03-07 20:40:31 -08:00
Jeff McCune
5f58263232 (#40) Create arc namespaces
Named after the upstream install guide, though arc-systems makes me
twitch for arc-system.
2024-03-07 20:37:35 -08:00
Jeff McCune
b6bdd072f7 (#40) Include crds when running helm template
Might need to make this a configurable option, but for now just always
do it.
2024-03-07 20:37:35 -08:00
Jeff McCune
509f2141ac (#40) Actions Runner Controller
This patch adds support for helm oci images which are used by the
gha-runner-scale-set-controller.

For example, arc is installed normally with:

```
NAMESPACE="arc-systems"
helm install arc \
    --namespace "${NAMESPACE}" \
    --create-namespace \
    oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set-controller
```

This patch caches the oci image in the same way as the repository based
method.

Refer to: https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/quickstart-for-actions-runner-controller
2024-03-07 20:37:35 -08:00
Jeff McCune
4c2bc34d58 (#32) SecretStore Component
Separate the SecretStore resources from the namespaces component because
it creates a deadlock.  The secretstore crds don't get applied until the
eso component is managed.

The namespaces component should have nothing but core api objects, no
custom resources.
2024-03-07 16:01:22 -08:00
Jeff McCune
340715f76c (#36) Provide certs to Cockroach DB and Zitadel with ExternalSecrets
This patch switches CockroachDB to use certs provided by ExternalSecrets
instead of managing Certificate resources in-cluster from the upstream
helm chart.

This paves the way for multi-cluster replication by moving certificates
outside of the lifecycle of the workload cluster cockroach db operates
within.

Closes: #36
2024-03-06 10:38:47 -08:00
Jeff McCune
64ffacfc7a (#36) Add Cockroach Issuer for Zitadel to provisioner cluster
Issuing mtls certs for cockroach db moves to the provisioner cluster so
we can more easily support cross cluster replication in the future.
crdb certs will be synced same as public tls certs, using ExternalSecret
resources.
2024-03-06 09:36:20 -08:00
Jeff McCune
fd5a2fdbc1 (#36) Sync certs as ExternalSecrets from workload clusters
This patch replaces the httpbin and login cert on the workload clusters
with an ExternalSecret to sync the tls cert from the provisioner
cluster.
2024-03-05 17:05:10 -08:00
Jeff McCune
eb3e272612 (#36) Dynamically generate cluster certs from Platform spec
Each cluster should be more or less identical, configure certs from the
dynamic list of platform clusters.
2024-03-05 16:44:35 -08:00
Jeff McCune
2b3b5a4887 (#36) Issue login and httpbin certs
This patch uses cert manager in the provisioner cluster to provision tls
certs for https://login.example.com and https://httpbin.k2.example.com

The certs are not yet synced to the clusters.  Next step is to replace
the Certificate resources with ExternalSecret resources, then remove
cert manager from the workload clusters.
2024-03-05 14:27:37 -08:00
Jeff McCune
7426e8f867 (#36) Move cert-manager to the provisioner cluster
This patch moves certificate management to the provisioner cluster to
centralize all secrets into the highly secured cluster.  This change
also simplifies the architecture in a number of ways:

1. Certificate lives are now completely independent of cluster
   lifecycle.
2. Remove the need for bi-directional sync to save cert secrets.
3. Workload clusters no longer need access to DNS.
2024-03-05 12:51:58 -08:00
Jeff McCune
b3f682453d (#31) Inject istio sidecar into Deployment zitadel using Kustomize
Multiple holos components rely on kustomize to modify the output of the
upstream helm chart, for example patching a Deployment to inject the
istio sidecar.

The new holos cue based component system did not support running
kustomize after helm template.  This patch adds the kustomize execution
if two fields are defined in the helm chart kind of cue output.

The API spec is pretty loose in this patch but I'm proceeding for
expedience and to inform the final API with more use cases as more
components are migrated to cue.
2024-03-05 09:56:39 -08:00