Commit Graph

142 Commits

Author SHA1 Message Date
Jeff McCune
b86fee04fc (#48) v0.55.4 to rebuild k3, k4, k5 v0.55.4 2024-03-11 08:48:07 -07:00
Jeff McCune
c78da6949f Merge pull request #51 from holos-run/jeff/48-zitadel-backups
(#48) Custom PGO Certs for Zitadel
2024-03-10 23:08:29 -07:00
Jeff McCune
7b215bb8f1 (#48) Custom PGO Certs for Zitadel
The [Streaming Standby][standby] architecture requires custom tls certs
for two clusters in two regions to connect to each other.

This patch manages the custom certs following the configuration
described in the article [Using Cert Manager to Deploy TLS for Postgres
on Kubernetes][article].

NOTE: One thing not mentioned anywhere in the crunchy documentation is
how custom tls certs work with pgbouncer.  The pgbouncer service uses a
tls certificate issued by the pgo root cert, not by the custom
certificate authority.

For this reason, we use kustomize to patch the zitadel Deployment and
the zitadel-init and zitadel-setup Jobs.  The patch projects the ca
bundle from the `zitadel-pgbouncer` secret into the zitadel pods at
/pgbouncer/ca.crt

[standby]: https://access.crunchydata.com/documentation/postgres-operator/latest/architecture/disaster-recovery#streaming-standby-with-an-external-repo
[article]: https://www.crunchydata.com/blog/using-cert-manager-to-deploy-tls-for-postgres-on-kubernetes
2024-03-10 22:54:06 -07:00
Jeff McCune
78cec76a96 (#48) Restore ZITADEL from point in time full backup
A full backup was taken using:

```
kubectl annotate postgrescluster zitadel postgres-operator.crunchydata.com/pgbackrest-backup="$(date)"
```

And completed with:

```
❯ k logs -f zitadel-backup-5r6v-v5jnm
time="2024-03-10T21:52:15Z" level=info msg="crunchy-pgbackrest starts"
time="2024-03-10T21:52:15Z" level=info msg="debug flag set to false"
time="2024-03-10T21:52:15Z" level=info msg="backrest backup command requested"
time="2024-03-10T21:52:15Z" level=info msg="command to execute is [pgbackrest backup --stanza=db --repo=2 --type=full]"
time="2024-03-10T21:55:18Z" level=info msg="crunchy-pgbackrest ends"
```

This patch verifies the point in time backup is robust in the face of
the following operations:

1. pg cluster zitadel was deleted (whole namespace emptied)
2. pg cluster zitadel was re-created _without_ a `dataSource`
3. pgo initailized a new database and backed up the blank database to
   S3.
4. pg cluster zitadel was deleted again.
5. pg cluster zitadel was re-created with `dataSource` `options: ["--type=time", "--target=\"2024-03-10 21:56:00+00\""]` (Just after the full backup completed)
6. Restore completed successfully.
7. Applied the holos zitadel component.
8. Zitadel came up successfully and user login worked as expected.

- [x] Perform an in place [restore][restore] from [s3][bucket].
- [x] Set repo1-retention-full to clear warning

[restore]: https://access.crunchydata.com/documentation/postgres-operator/latest/tutorials/backups-disaster-recovery/disaster-recovery#restore-properties
[bucket]: https://access.crunchydata.com/documentation/postgres-operator/latest/tutorials/backups-disaster-recovery/disaster-recovery#cloud-based-data-source
2024-03-10 17:42:54 -07:00
Jeff McCune
0e98ad2ecb (#48) Zitadel Backups
This patch configures backups suitable to support the [Streaming Standby
with an External Repo][0] architecture.

- [x] PGO [Multiple Backup Repositories][1] to k8s pv and s3.
- [x] [Encryption][2] of backups to S3.
- [x] [Remove SUPERUSER][3] role from zitadel-admin pg user to work with pgbouncer.  Resolves zitadel-init job failure.
- [x] Take a [Manual Backup][5]

[0]: https://access.crunchydata.com/documentation/postgres-operator/latest/architecture/disaster-recovery#streaming-standby-with-an-external-repo
[1]: https://access.crunchydata.com/documentation/postgres-operator/latest/tutorials/backups-disaster-recovery/backups#set-up-multiple-backup-repositories
[2]: https://access.crunchydata.com/documentation/postgres-operator/latest/tutorials/backups-disaster-recovery/backups#encryption
[3]: https://github.com/CrunchyData/postgres-operator/issues/3095#issuecomment-1904712211
[4]: https://access.crunchydata.com/documentation/postgres-operator/latest/tutorials/backups-disaster-recovery/disaster-recovery#streaming-standby-with-an-external-repo
[5]: https://access.crunchydata.com/documentation/postgres-operator/latest/tutorials/backups-disaster-recovery/backup-management#taking-a-one-off-backup
2024-03-10 16:38:56 -07:00
Jeff McCune
30bb3f183a (#50) Describe type as strings to match others 2024-03-10 11:29:19 -07:00
Jeff McCune
1369338f3c (#50) Add -n shorthand for --namespace for secrets
It's annoying holos get secret -n foo doesn't work like kubectl get
secret -n foo works.

Closes: #50
v0.55.3
2024-03-10 10:45:49 -07:00
Jeff McCune
ac03f64724 (#45) Configure ZITADEL to use pgbouncer 2024-03-09 09:44:33 -08:00
Jeff McCune
bea4468972 (#42) Remove cert manager db ca components
Simpler to let postgres manage the certs.  TLS is in verify-full mode
with the pgo configured certs.
v0.55.2
2024-03-08 21:34:26 -08:00
Jeff McCune
224adffa15 (#42) Add holos components for zitadel with postgres
To establish the canonical https://login.ois.run identity issuer on the
core cluster pair.

Custom resources for PGO have been imported with:

    timoni mod vendor crds -f deploy/clusters/core2/components/prod-pgo-crds/prod-pgo-crds.gen.yaml

Note, the zitadel tls connection took some considerable effort to get
working.  We intentionally use pgo issued certs to reduce the toil of
managing certs issued by cert manager.

The default tls configuration of pgo is pretty good with verify full
enabled.
2024-03-08 21:29:25 -08:00
Jeff McCune
b4d34ffdbc (#42) Fix incorrect ceph pool for core2 cluster
The core2 cluster cannot provision pvcs because it's using the k8s-dev
pool when it has credentials valid only for the k8s-prod pool.

This patch adds an entry to the platform cluster map to configure the
pool for each cluster, with a default of k8s-dev.
v0.55.1
2024-03-08 13:14:27 -08:00
Jeff McCune
a85db9cf5e (#42) Add KustomizeBuild holos component type to install pgo
PGO uses plain yaml and kustomize as the recommended installation
method.  Holos supports upstream by adding a new PlainFiles component
kind, which simply copies files into place and lets kustomize handle the
generation of the api objects.

Cue is responsible for very little in this kind of component, basically
allowing overlay resources if needed and deferring everything else to
the holos cli.

The holos cli in turn is responsible for executing kubectl kustomize
build on the input directory to produce the rendered output, then writes
the rendered output into place.
v0.55.0
2024-03-08 11:27:42 -08:00
Jeff McCune
990c82432c (#40) Fix go releaser with standard arc runners
Standard arc runner image is missing gpg and git.
v0.54.1
2024-03-07 22:59:15 -08:00
Jeff McCune
e3673b594c Merge pull request #41 from holos-run/jeff/40-actions-runners
(#40) Actions Runner Controller (Runner Scale Sets)
2024-03-07 22:43:16 -08:00
Jeff McCune
f8cf278a24 (#40) bump to v0.54.0 2024-03-07 22:37:51 -08:00
Jeff McCune
b0bc596a49 (#40) Update workflow to run on arc runner set
Matches the value of the github/arc/runner component helm release, which
is the installation name.
2024-03-07 22:37:51 -08:00
Jeff McCune
4501ceec05 (#40) Use baseline security context for GitHub arc
Without this patch the arc controller fails to create a listener.  The
template for the listener doesn't appear to be configurable from the
chart.

Could patch the listener pod template with kustomize, do this as a
follow up feature.

With this patch we get the expected two pods in the runner system
namespace:

```
❯ k get pods
NAME                                 READY   STATUS    RESTARTS   AGE
gha-rs-7db9c9f7-listener             1/1     Running   0          43s
gha-rs-controller-56bb9c77d9-6tjch   1/1     Running   0          8s
```
2024-03-07 22:37:50 -08:00
Jeff McCune
4183fdfd42 (#40) Note the helm release name is the installation name
Which is the value of the `runs-on` field in workflows.
2024-03-07 22:37:50 -08:00
Jeff McCune
2595793019 (#40) Do not force the namespace with kustomize
To avoid confining the custom resource definitions to a namespace.
2024-03-07 22:37:50 -08:00
Jeff McCune
aa3d1914b1 (#40) Manage the actions runner scale sets 2024-03-07 22:37:49 -08:00
Jeff McCune
679ddbb6bf (#40) Use Restricted pod security for arc runners
Might as well put the restriction in place before deploying the runners
to see what breaks.
2024-03-07 22:37:49 -08:00
Jeff McCune
b1d7d07a04 (#40) Add field for helm chart release name
The resource names for the arc controller are too long:

❯ k get pods -n arc-systems
NAME                                                              READY   STATUS    RESTARTS   AGE
gha-runner-scale-set-controller-gha-rs-controller-6bdf45bd6jx5n   1/1     Running   0          59m

Solve the problem by allowing components to set the release name to
`gha-rs-controller` which requires an additional field from the cue code
to differentiate from the chart name.
2024-03-07 20:40:31 -08:00
Jeff McCune
5f58263232 (#40) Create arc namespaces
Named after the upstream install guide, though arc-systems makes me
twitch for arc-system.
2024-03-07 20:37:35 -08:00
Jeff McCune
b6bdd072f7 (#40) Include crds when running helm template
Might need to make this a configurable option, but for now just always
do it.
2024-03-07 20:37:35 -08:00
Jeff McCune
509f2141ac (#40) Actions Runner Controller
This patch adds support for helm oci images which are used by the
gha-runner-scale-set-controller.

For example, arc is installed normally with:

```
NAMESPACE="arc-systems"
helm install arc \
    --namespace "${NAMESPACE}" \
    --create-namespace \
    oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set-controller
```

This patch caches the oci image in the same way as the repository based
method.

Refer to: https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/quickstart-for-actions-runner-controller
2024-03-07 20:37:35 -08:00
Jeff McCune
4c2bc34d58 (#32) SecretStore Component
Separate the SecretStore resources from the namespaces component because
it creates a deadlock.  The secretstore crds don't get applied until the
eso component is managed.

The namespaces component should have nothing but core api objects, no
custom resources.
v0.53.4
2024-03-07 16:01:22 -08:00
Jeff McCune
d831070f53 Trim trailing newlines from files when creating secrets
Without this patch, the pattern of echoing data (without -n) or editing
files in a directory to represent the keys of a secret results in a
trailing newline in the kubernetes Secret.

This patch trims off the trailing newline by default, with the option to
preserve it with the --trim-trailing-newlines=false flag.
v0.53.3
2024-03-06 11:21:32 -08:00
Jeff McCune
340715f76c (#36) Provide certs to Cockroach DB and Zitadel with ExternalSecrets
This patch switches CockroachDB to use certs provided by ExternalSecrets
instead of managing Certificate resources in-cluster from the upstream
helm chart.

This paves the way for multi-cluster replication by moving certificates
outside of the lifecycle of the workload cluster cockroach db operates
within.

Closes: #36
v0.53.2
2024-03-06 10:38:47 -08:00
Jeff McCune
64ffacfc7a (#36) Add Cockroach Issuer for Zitadel to provisioner cluster
Issuing mtls certs for cockroach db moves to the provisioner cluster so
we can more easily support cross cluster replication in the future.
crdb certs will be synced same as public tls certs, using ExternalSecret
resources.
v0.53.1
2024-03-06 09:36:20 -08:00
Nate McCurdy
54acea42cb Merge pull request #37 from holos-run/nate/preflight
Add 'holos preflight' command, check for GitHub CLI
2024-03-06 09:32:54 -08:00
Nate McCurdy
5ef8e75194 Fix Actions warning during Lint by updating golangci-lint-action
Warning:
> Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: golangci/golangci-lint-action@v3. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
2024-03-05 17:42:30 -08:00
Nate McCurdy
cb2b5c0f49 Add the 'preflight' subcommand; check for GitHub access
This adds a new holos subcommand: preflight

Initially, this just checks that the GitHub CLI is installed and
authenticated.

The preflight command will be used to validate that the user has the
neccessary CLI tools, access, and authorization to start using Holos and
setup a Holos cluster.
2024-03-05 17:40:08 -08:00
Jeff McCune
fd5a2fdbc1 (#36) Sync certs as ExternalSecrets from workload clusters
This patch replaces the httpbin and login cert on the workload clusters
with an ExternalSecret to sync the tls cert from the provisioner
cluster.
2024-03-05 17:05:10 -08:00
Jeff McCune
eb3e272612 (#36) Dynamically generate cluster certs from Platform spec
Each cluster should be more or less identical, configure certs from the
dynamic list of platform clusters.
2024-03-05 16:44:35 -08:00
Nate McCurdy
9f2a51bde8 Move the RunCmd function to the util package
More than one Holos package needs to execute commands, so pull out the
runCmd from builder and move it to the util package.

This commits adds the following to the util package:
* util.RunCmd func
* util.runResult struct
2024-03-05 15:12:14 -08:00
Jeff McCune
2b3b5a4887 (#36) Issue login and httpbin certs
This patch uses cert manager in the provisioner cluster to provision tls
certs for https://login.example.com and https://httpbin.k2.example.com

The certs are not yet synced to the clusters.  Next step is to replace
the Certificate resources with ExternalSecret resources, then remove
cert manager from the workload clusters.
v0.52.0
2024-03-05 14:27:37 -08:00
Jeff McCune
7426e8f867 (#36) Move cert-manager to the provisioner cluster
This patch moves certificate management to the provisioner cluster to
centralize all secrets into the highly secured cluster.  This change
also simplifies the architecture in a number of ways:

1. Certificate lives are now completely independent of cluster
   lifecycle.
2. Remove the need for bi-directional sync to save cert secrets.
3. Workload clusters no longer need access to DNS.
2024-03-05 12:51:58 -08:00
Jeff McCune
cf0c455aa2 (#34) Add test for print secret data 2024-03-05 11:14:37 -08:00
Jeff McCune
752a3f912d (#34) Remove debug info logs v0.51.2 2024-03-05 11:05:51 -08:00
Jeff McCune
7d5852d675 (#34) Print secret data as json
Closes: #34
v0.51.1
2024-03-05 11:03:47 -08:00
Jeff McCune
66b4ca0e6c (#31) Fix helm missing in actions workflow
Causing test failures that should pass.
2024-03-05 10:11:43 -08:00
Jeff McCune
b3f682453d (#31) Inject istio sidecar into Deployment zitadel using Kustomize
Multiple holos components rely on kustomize to modify the output of the
upstream helm chart, for example patching a Deployment to inject the
istio sidecar.

The new holos cue based component system did not support running
kustomize after helm template.  This patch adds the kustomize execution
if two fields are defined in the helm chart kind of cue output.

The API spec is pretty loose in this patch but I'm proceeding for
expedience and to inform the final API with more use cases as more
components are migrated to cue.
v0.51.0
2024-03-05 09:56:39 -08:00
Jeff McCune
0c3181ae05 (#31) Add VirtualService for Zitadel
Also import the Kustomize types using:

    cue get go sigs.k8s.io/kustomize/api/types/...
2024-03-04 17:18:46 -08:00
Jeff McCune
18cbff0c13 (#31) Add tls cert for zitadel to connect to cockroach db
Cockroach DB uses tls certs for client authentication.  Issue one for
Zitadel.

With this patch Zitadel starts up bit is not yet exposted with a
VirtualService.

Refer to https://zitadel.com/docs/self-hosting/manage/configure
v0.50.2
2024-03-04 14:46:49 -08:00
Jeff McCune
b4fca0929c (#31) ExternalSecret for zitadel-masterkey 2024-03-04 14:31:27 -08:00
Jeff McCune
911d65bdc6 (#31) Setup login.ois.run with basic istio default Gateway
The istio default Gateway is the basis for what will become a dynamic
set of server entries specified from cue project data integrated with
extauthz.

For now we simply need to get the identity provider up and running as
the first step toward identity and access management.
2024-03-04 13:59:17 -08:00
Jeff McCune
2a5eccf0c1 (#33) Helm stderr logging
Log error messages from helm when building and rendering holos
components.

Closes: #33
v0.50.1
2024-03-04 13:16:51 -08:00
Jeff McCune
9db4873205 (#31) Add Cockroach DB for Zitadel
Following https://github.com/zitadel/zitadel-charts/blob/main/examples/4-cockroach-secure/README.md
v0.50.0
2024-03-04 10:31:39 -08:00
Jeff McCune
f90e83e142 (#30) Add httpbin Gateway and VirtualService
There isn't a default Gateway yet, so use a specific `httpbin` gateway
to test istio instead.
v0.49.7
2024-03-02 21:12:03 -08:00
Jeff McCune
bdd2964edb (#30) Add httpbin Service for ns istio-ingress 2024-03-02 20:39:55 -08:00