Compare commits

..

3 Commits

Author SHA1 Message Date
Jeff McCune
0d7bbbb659 (#48) Disable pg spec.dataSource for standby cluster
Problem:
The standby cluster on k2 fails to start.  A pgbackrest pod first
restores the database from S3, then the pgha nodes try to replay the WAL
as part of the standby initialization process.  This fails because the
PGDATA directory is not empty.

Solution:
Specify the spec.dataSource field only when the cluster is configured as
a primary cluster.

Result:
Non-primary clusters are standby, they skip the pgbackrest job to
restore from S3 and move straight to patroni replaying the WAL from S3
as part of the pgha pods.

One of the two pgha pods becomes the "standby leader" and restores the
WAL from S3.  The other is a cascading standby and then restores the
same WAL from the standby leader.

After 8 minutes both pods are ready.

```
❯ k get pods
NAME                               READY   STATUS    RESTARTS   AGE
zitadel-pgbouncer-d9f8cffc-j469g   2/2     Running   0          11m
zitadel-pgbouncer-d9f8cffc-xq29g   2/2     Running   0          11m
zitadel-pgha1-27w7-0               4/4     Running   0          11m
zitadel-pgha1-c5qj-0               4/4     Running   0          11m
zitadel-repo-host-0                2/2     Running   0          11m
```
2024-03-11 17:56:47 -07:00
Jeff McCune
3f3e36bbe9 (#48) Split workload into foundation and accounts
Problem:
The k3 and k4 clusters are getting the Zitadel components which are
really only intended for the core cluster pair.

Solution:
Split the workload subtree into two, named foundation and accounts.  The
core cluster pair gets foundation+accounts while the kX clusters get
just the foundation subtree.

Result:
prod-zitadel-iam is no longer managed on k3 and k4
2024-03-11 15:20:35 -07:00
Jeff McCune
9f41478d33 (#48) Restore from Monday morning after Gary and Nate registered
Set the restore point to time="2024-03-11T17:08:58Z" level=info
msg="crunchy-pgbackrest ends" which is just after Gary and Nate
registered and were granted the cluster-admin role.
2024-03-11 10:18:45 -07:00
58 changed files with 33 additions and 23 deletions

View File

@@ -8,12 +8,17 @@ let S3Secret = "pgo-s3-creds"
let ZitadelUser = _DBName
let ZitadelAdmin = "\(_DBName)-admin"
// This must be an external storage bucket for our architecture.
let BucketRepoName = "repo2"
// Restore options. Set the timestamp to a known good point in time.
// time="2024-03-11T17:08:58Z" level=info msg="crunchy-pgbackrest ends"
let RestoreOptions = ["--type=time", "--target=\"2024-03-11 17:10:00+00\""]
#KubernetesObjects & {
apiObjects: {
ExternalSecret: "pgo-s3-creds": _
PostgresCluster: db: #PostgresCluster & HighlyAvailable & {
// This must be an external storage bucket for our architecture.
let BucketRepoName = spec.backups.pgbackrest.manual.repoName
metadata: name: _DBName
metadata: namespace: #TargetNamespace
spec: {
@@ -47,36 +52,37 @@ let ZitadelAdmin = "\(_DBName)-admin"
enabled: true
}
}
// Restore from a backup
dataSource: pgbackrest: {
stanza: "db"
configuration: [{secret: name: S3Secret}]
// Restore from known good full backup taken in https://github.com/holos-run/holos/issues/48#issuecomment-1987375044
options: ["--type=time", "--target=\"2024-03-10 21:56:00+00\""]
global: {
"\(BucketRepoName)-path": "/pgbackrest/\(#TargetNamespace)/\(metadata.name)/\(BucketRepoName)"
"\(BucketRepoName)-cipher-type": "aes-256-cbc"
}
repo: {
name: BucketRepoName
s3: {
bucket: string | *"\(#Platform.org.name)-zitadel-backups"
region: string | *#Backups.s3.region
endpoint: string | *"s3.dualstack.\(region).amazonaws.com"
// Restore from backup if and only if the cluster is primary
if Cluster.primary {
dataSource: pgbackrest: {
stanza: "db"
configuration: backups.pgbackrest.configuration
// Restore from known good full backup taken
options: RestoreOptions
global: {
"\(BucketRepoName)-path": "/pgbackrest/\(#TargetNamespace)/\(metadata.name)/\(BucketRepoName)"
"\(BucketRepoName)-cipher-type": "aes-256-cbc"
}
repo: {
name: BucketRepoName
s3: backups.pgbackrest.repos[1].s3
}
}
}
// Refer to https://access.crunchydata.com/documentation/postgres-operator/latest/tutorials/backups-disaster-recovery/backups
backups: pgbackrest: {
configuration: dataSource.pgbackrest.configuration
configuration: [{secret: name: S3Secret}]
// Defines details for manual pgBackRest backup Jobs
manual: {
// Note: the repoName value must match the config keys in the S3Secret.
// This must be an external repository for backup / restore / regional failovers.
repoName: "repo2"
repoName: BucketRepoName
options: ["--type=full", ...]
}
// Defines details for performing an in-place restore using pgBackRest
restore: {
// Enables triggering a restore by annotating the postgrescluster with postgres-operator.crunchydata.com/pgbackrest-restore="$(date)"
enabled: true
repoName: BucketRepoName
}
@@ -105,7 +111,11 @@ let ZitadelAdmin = "\(_DBName)-admin"
// Full backup weekly on Sunday at 1am, differntial daily at 1am every day except Sunday.
schedules: full: string | *"0 1 * * 0"
schedules: differential: string | *"0 1 * * 1-6"
s3: dataSource.pgbackrest.repo.s3
s3: {
bucket: string | *"\(#Platform.org.name)-zitadel-backups"
region: string | *#Backups.s3.region
endpoint: string | *"s3.dualstack.\(region).amazonaws.com"
}
},
]
}

View File

@@ -1 +1 @@
55
56

View File

@@ -1 +1 @@
4
0