Previosly, the holos component Results for each ArgoCD Application
resource managed as part of each BuildPlan results in an empty file
being written for the empty list of k8s api objects.
This patch fixes the problem by skipping writing the accumulated output
of API objects with the Result metadata.name starts with `gitops/`.
This is kind of a hack, but it works well enough for now.
Previously components appeared to be duplicated, it was not clear to the
user one build plan results in two components: one for the k8s yaml and
one for the gitops argocd Application resource.
```
❯ holos render component --cluster-name aws1 components/login/zitadel-server
9:27AM INF result.go:195 wrote deploy file version=0.84.1 path=deploy/clusters/aws1/gitops/zitadel-server.application.gen.yaml bytes=338
9:27AM INF render.go:92 rendered zitadel-server version=0.84.1 cluster=aws1 name=zitadel-server status=ok action=rendered
9:27AM INF render.go:92 rendered zitadel-server version=0.84.1 cluster=aws1 name=zitadel-server status=ok action=rendered
```
This patch prefixes the ArgoCD Application resource, which is
implemented as a separate HolosComponent in the same BuildPlan. The
result is more clear about what is going on:
```
❯ holos render component --cluster-name aws1 components/login/zitadel-server
9:39AM INF result.go:195 wrote deploy file version=0.84.1 path=deploy/clusters/aws1/gitops/zitadel-server.application.gen.yaml bytes=338
9:39AM INF render.go:92 rendered gitops/zitadel-server version=0.84.1 cluster=aws1 name=gitops/zitadel-server status=ok action=rendered
9:39AM INF render.go:92 rendered zitadel-server version=0.84.1 cluster=aws1 name=zitadel-server status=ok action=rendered
```
The pod identity webhook component fails to render with v1alpha2. This
patch fixes the problem by providing concrete values for enableHooks and
the namespace of the helm chart holos component.
The namespace is mainly necessary to render the ArgoCD Application
resource along side the helm chart output.
With this patch the eso-creds-manager component renders correctly. This
is a `#Kubernetes` type build plan which uses the
spec.components.resources map to manage resources.
The only issue was needing to provide the namespace to the nested holos
component inside the BuildPlan.
The ArgoCD Application resource moves to the DeployFiles field of a
separate holos component in the same build plan at
spec.components.resources.argocd. For this reason a separate Result
object is no longer necessary inside of the Holos cli for the purpose of
managing Flux or ArgoCD gitops. The CUE code can simply inline whatever
gitops resources it wants and the holos cli will write the files
relative to the cluster specific deploy directory.
Result:
```
❯ holos render component --cluster-name management components/eso-creds-manager
2:55PM INF result.go:195 wrote deploy file version=0.84.1 path=deploy/clusters/management/gitops/eso-creds-manager.application.gen.yaml bytes=350
2:55PM INF render.go:92 rendered eso-creds-manager version=0.84.1 cluster=management name=eso-creds-manager status=ok action=rendered
```
Previously holos render platform failed for the holos platform. The issue was
caused by the deployFiles field moving from the BuildPlan down to
HolosComponent.
This patch fixes the problem by placing the ArgoCD Application resource into a
separate Resources entry of the BuildPlan. The sole purpose of this additional
entry in the Resources map is to produce the Application resource along side
any other components which are part of the build plan.
Previously methods were defined on the API objects in the v1alpha1 API.
The API should be data structures only. This patch refactors the
methods responsible for orchestrating the build plan to pull them into
the internal render package.
The result is the API is cleaner and has no methods. The render package
has corresponding data structures which simply wrap around the API
structure and implement the methods to render and return the result to
the CLI.
This commit compiles, but it has not been tested at all. It's almost
surely broken completely.
Previously in v1alpha1, all Holos structs are located in the same
package. This makes it difficult to focus on only the structs necessary
to transfer configuration data from CUE to the `holos` cli.
This patch splits the structs into `meta` and `core` where the core
package holds the structs end users should refer to and focus on. Only
the Platform resource is in core now, but other BuildPlan types will be
added shortly.
Previously Backstage was not configured to integrate with GitHub. The
integration is necessary for Backstage to automatically discover
resources in a GitHub organization and import them into the Catalog.
This patch adds a new platform model form field and section for the
primary GitHub organization name of the platform. Additional GitHub
organizations can be added in the future, Backstage supports them.
The result is Backstage automatically scans public and private
repositories and adds the information in `catalog-info.yaml` to the UI.
Previosly the gateway ArogCD Application resource is out of sync because
the `default-istio` `ServiceAccount` is not in the git repository
source. Argo would prune the service account on sync which is a problem.
This patch manages the service account so the Application can be synced
properly.
Previously the holos render platform command fails with the following
error when giving a demo after the generate platform step.
This patch updates the internal generated holos platform to the latest
version.
Running through the demo is successful now.
```
holos logout
holos login
holos register user
holos generate platform holos
holos pull platform config .
holos render platform ./platform
```
I'm not sure if we should check in the loop, in the go routine, or in
both places. Double check in both cases just to be sure we're not doing
extra unnecessary work.
Previously a channel was used to limit concurrency. This is more
difficult to read and comprehend than the inbuilt errorgroup.SetLimit
functionality.
This patch uses `errgroup.`[Group.SetLimit()][1] to limit concurrency,
avoid leaking go routines, and avoid unnecessary work.
[1]: https://pkg.go.dev/golang.org/x/sync/errgroup#Group.SetLimit
This adds concurrency to the 'holos render platform' command so platform
components are rendered in less time than before.
Default concurrency is set to `min(runtime.NumCPU(), 8)`, which is the
lesser of 8 or the number of CPU cores. In testing, I found that past 8,
there are diminishing or negative returns due to memory usage or
rendering each component.
In practice, this reduced rendering of the saas platform components from
~90s to ~28s on my 12-core macbook pro.
This also changes the key name of the Helm Chart's version in log lines
from `version` to `chart_version` since `version` already exists and
shows the Holos CLI version.
Previously, when a user registered and logged into the holos app server,
they were able to reach admin interfaces like
https://argocd.admin.example.com
This patch adds AuthorizationPolicy resources governing the whole
cluster. Users with the prod-cluster-{admin,edit,view} roles may access
admin services like argocd.
Users without these roles are blocked with RBAC: access denied.
In ZITADEL, the Holos Platform project is granted to the CIAM
organization without granting the prod-cluster-* roles, so there's no
possible way a CIAM user account can have these roles.
Previously there wasn't a good way to populate the platform model in the
database after building a new instance of holos server.
With this patch, the process to reset clean is:
```
export HOLOS_SERVER=https://dev.app.holos.run:443
grpcurl -H "x-oidc-id-token: $(holos token)" ${HOLOS_SERVER##*/} holos.user.v1alpha1.SystemService.DropTables
grpcurl -H "x-oidc-id-token: $(holos token)" ${HOLOS_SERVER##*/} holos.system.v1alpha1.SystemService.SeedDatabase
```
Then populate the form and model:
```
holos push platform form .
holos push platform model .
```
The `platform.config.json` file stored in version control is pushed to
the holos server and stored in the database. This makes it nice and
easy to reset entirely, or move to another service url.
Previously the default oidc issuer was to one of the kubernetes clusters
running in my basement. This patch changes the issuer to the production
ready issuer running in EKS.
Previously the holos server Service was not exposed.
This patch exposes the holos service with an HTTPRoute behind the auth
proxy. Holos successfully authenticates the user with the
x-oidc-id-token header set by the default Gateway.
---
Add dev-holos-infra and dev-holos-app
Previously the PostgresCluster and the holos server Deployment are not
managed on the aws2 cluster.
This patch is a start, but the Deployment does not yet start. We need
to pass an option for the oidc issuer.
---
Add namespaces and cert for prod-holos, dev-holos, jeff-holos
Previously we didn't have a place to deploy holos server. This patch
adds a namespace, creates a Gateway listener, and binds the tls certs
for app.example.com and *.app.example.com to the listeners.
In addition, cluster specific endpoints of *.app.aws2.example.com,
*.app.aws1.example.com, etc. are created to provide dev environment
urls. For example jeff.app.aws2.example.com is my personal dev hostname.
Previously holos render platform ./platform did not render any GitOps
resources for Flux or ArgoCD.
This patch uses the new DeployFiles field in holos v0.83.0 to write an
Application resource for every component BuildPlan listed in the
platform.
Previously, each BuildPlan has no clear way to produce an ArgoCD
Application resource. This patch provides a general solution where each
BuildPlan can provide arbitrary files as a map[string]string where the
key is the file path relative to the gitops repository `deploy/` folder.
Previously ArgoCD has no ssh credentials to connect to GitHub. This
patch adds an ssh ed25519 key as a secret in the management cluster.
The secret is synced to the workload clusters using an ExternalSecret
with the proper label for ArgoCD to find and load it for use with any
application that references the Git URL.
Previously a logged in user could not modify anything in ArgoCD. With
this patch users who have been granted the prod-cluster-admin role in
ZITADEL are granted the admin role in ArgoCD.
Previously ArgoCD was present in the platform configuration, but not
functional. This patch brings ArgoCD fully up, integrated with the
service mesh, auth proxy, and SSO at
https://argocd.admin.clustername.example.com/
The upstream [helm chart][1] is used instead of the kustomize install
method. We had existing prior art integrating the v6 helm chart with
the holos platform identity provider, so we continue with the helm
chart.
CRDs are still managed with the kustomize version. The CRDs need to be
kept in sync. It's possible to generate the kustomization.yaml file
from the same version value as is used by the helm chart, but we don't
for the time being.
[1]: https://github.com/argoproj/argo-helm/tree/argo-cd-7.1.1/charts/argo-cd
Previously, no RequestAuthentication or AuthorizationPolicy resources
govern the default Gateway. This patch adds the resources and
configures the service mesh with the authproxy as an ExtAuthZ provider
for CUSTOM AuthorizationPolicy rules.
This patch also fixes a bug in the zitadel-server component where
resources from the upstream helm chart did not specify a namespace.
Kustomize is used as a post processor to force all resources into the
zitadel namespace.
Add multiple HTTPRoutes to validate http2 connection reuse
This patch adds multiple HTTPRoute resources which match
*.admin.example.com The purpose is to validate http2 connections are
reused properly with Chrome.
With this patch no 404 no route errors are encountered when navigating
between the various httpbin{1,2,3,4} urls.
Add note backupRestore will trigger a restore
The process of configuring ZITADEL to provision from a datasource will
cause an in-place restore from S3. This isn't a major issue, but users
should be aware data added since the most recent backup will be lost.
Previously, HTTPRoute resources were in the same namespace as the
backend service, httpbin in this case. This doesn't follow the default
behavior of a Gateway listener only allowing attachment from HTTPRoute
resources in the same namespace as the Gateway.
This also complicates intercepting the authproxy path prefix and sending
it to the authproxy. We'd need to add a ReferenceGrant in the authproxy
namespace, which seems backwards and dangerous because it would grant
the application developer the ability to route requests to all Services
in the istio-gateways namespace.
This patch enables Cluster Operators to manage the HTTPRoute resources
and direct the auth proxy path prefix of `/holos/authproxy` to the auth
proxy Service in the same namespace.
ReferenceGrant resources are used to enable the HTTPRoute backend
references.
When an application developer needs to manage their own HTTPRoute, as is
the case for ZITADEL, a label selector may be used and will override
less specific HTTPRoute hostsnames in the istio-gateways namespace.
With redis. The auth proxy authenticates correctly against zitadel
running in the same cluster. Validated by visiting
https://httpbin.admin.clustername.example.com/holos/authproxy
Visiting
https://httpbin.admin.clustername.example.com/holos/authproxy/auth
returns the id token in the response header, visible in the Chrome
network inspector. The ID token works as expected from multiple orgs
with project grants in ZITADEL from the Holos org to the OIS org.
This patch doesn't fully implement the auth proxy feature.
AuthorizationPolicy and RequestAuthentication resources need to be
added.
Before we do so, we need to move the HTTPRoute resources into the
gateway namespace so all of the security policies are in one place and
to simplify the process of routing requests to two backends, the
authproxy and the backend server.
This patch adds multiple HTTPRoute resources which match
*.admin.example.com The purpose is to validate http2 connections are
reused properly with Chrome.
With this patch no 404 no route errors are encountered when navigating
between the various httpbin{1,2,3,4} urls.
Problem:
Istio 1.22 with Gateway API and HTTPRoute is mis-routing HTTP2 requests
when the tls certificate has two dns names, for example
login.example.com and *.login.example.com.
When the user visits login.example.com and then tries to visit
other.login.example.com with Chrome, the connection is re-used and istio
returns a 404 route not found error even though there is a valid and
accepted HTTPRoute for *.login.example.com
This patch attempts to fix the problem by ensuring certificate dns names
map exactly to Gateway listeners. When a wildcard cert is used, the
corresponding Gateway listener host field exactly matches the wildcard
cert dns name so Istio and envoy should not get confused.
This patch adds the ZITADEL server component, which deploys zitadel from
a helm chart. Kustomize is used heavily to patch the output of helm to
make the configuration fit nicely with the holos platform.
With this patch the two Jobs that initialize the database and setup
ZITADEL run successfully. The ZITADEL deployment starts successfully.
ZITADEL is accessible at https://login.example.com/ with the default
admin username of `zitadel-admin@zitadel.login.example.com` and password
`Password1!`.
Use grant.holos.run/subdomain.admin: "true" for HTTPRoute
This patch clarifies the label that grants httproute attachment for a
subdomain Gateway listener to a namespace.
Fix istio-base holos component name
Was named `base` which is the chart name, not the holos component name.
This patch adds the postgres clusters and a few console form controls to
configure how backups are taken and if the postgres cluster is
initialized from an existing backup or not.
The pgo-s3-creds file is manually created at this time. It looks like:
❯ holos get secret -n zitadel pgo-s3-creds --print-key s3.conf
[global]
repo2-cipher-pass=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
repo2-s3-key=KKKKKKKKKKKKKKKKKKKK
repo2-s3-key-secret=/SSSSSSS/SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
repo3-cipher-pass=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
repo3-s3-key=KKKKKKKKKKKKKKKKKKKK
repo3-s3-key-secret=/SSSSSSS/SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
The s3 key and secret are credentials to read / write to the bucket.
The cipher pass is a random string for client side encryption. Generate
it with `tr -dc A-Za-z0-9 </dev/urandom | head -c 64`
This patch is foundational work for the ZITADEL login service.
This patch adds a tls certificate with names *.login.example.com and
login.example.com, a pair of listeners attached to the certificate in
the `default` Gateway, and the ExternalSecret to sync the secret from
the management cluster.
The zitadel namespace is managed and has the label
holos.run/login.grant: "true" to grant HTTPRoute attachment from the
zitadel namespace to the default Gateway in the istio-gateways
namespace.
With this change, https://httpbin.admin.aws1.example.com works as
expected.
PROXY protocol is configured on the AWS load balancer and the istio
gateway. The istio gateway logs have the correct client source ip
address and x-forwarded-for headers.
Namespaces must have the holos.run/admin.grant: "true" label in order to
attach an HTTPRoute to the admin section of the default Gateway.
The TLS certificate is working as expected and hopefully does not suffer
from the NR route not found issued encountered with the Istio Gateway
API.
This patch gets the istio-ingressgateway up and running in AWS with
minimal configuration. No authentication or authorization policies have
been migrated from previous iterations of the platform. These will be
handled in subsequent iterations.
Connectivity to a backend service like httpbin has not yet been tested.
This will happen in a follow up as well using /httpbin path prefixes on
existing services like argocd to conserve certificate resources.
This is the standard way to issue public facing certificates. Be aware
of the 50 cert limit per week from LetsEncrypt. We map names to certs
1:1 to avoid http2 connection reuse issues with istio.
Manage certificates on a project basis similar to how namespaces
associated with each project are managed.
Manage the Certificate resources on the management cluster in the
istio-ingress namespace so the tls certs can be synced to the workload
clusters.
The secretstores component is critical and provides the mechanism to
securely fetch Secret resources from the Management Cluster.
The holos server and configuration code stored in version control
contains only ExternalSecret references, no actual secrets.
This component adds a `default` `SecretStore` to each management
namespace which uses the `eso-reader` service account token to
authenticate to the management cluster. This service account is limited
to reading secrets within the namespace it resides in.
For example:
```yaml
---
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: default
namespace: external-secrets
spec:
provider:
kubernetes:
auth:
token:
bearerToken:
key: token
name: eso-reader
remoteNamespace: external-secrets
server:
caBundle: Long Base64 encoded string
url: https://34.121.54.174
```
This patch adds the `eso-creds-manager` component which needs to be
applied to the management cluster prior to the `eso-creds-refreher`
component being applied to workload clusters.
The manager component configures rbac to allow the creds-refresher job
to complete.
This patch also adjusts the behavior to only create secrets for the
eso-reader account by default.
Namespaces with the label `holos.run/eso.writer=true` will also have an
eso-writer secret provisioned in their namespace, allowing secrets to be
written back to the management cluster. This is intended for the
PushSecret resource.
Use v0.81.2 to build out the holos platform. Once we have the
components structured fairly well we can circle back around and copy the
components to schematics. There's a bit of friction regenerating the
platform from schematic each time.
Using CUE definitions like #Platform to hold data is confusing. Clarify
the use of fields, definitions like #Platform define the shape (schema)
of the data while private fields like _Platform represent and hold the
data.
The first thing most platforms need to do is come up with a strategy for
managing namespaces across multiple clusters.
This patch defines #Namespaces in the holos platform and adds a
namespaces component which loops over all values in the #Namespaces
struct and manages a kubernetes Namespace object.
The platform resource itself loops over all clusters in the platform to
manage all namespaces across all clusters.
From a blank slate:
```
❯ holos generate platform holos
4:26PM INF platform.go:79 wrote platform.metadata.json version=0.82.0 platform_id=018fa1cf-a609-7463-aa6e-fa53bfded1dc path=/home/jeff/workspace/holos-run/holos-infra/saas/platform.metadata.json
4:26PM INF platform.go:91 generated platform holos version=0.82.0 platform_id=018fa1cf-a609-7463-aa6e-fa53bfded1dc path=/home/jeff/workspace/holos-run/holos-infra/saas
❯ holos pull platform config .
4:26PM INF pull.go:64 pulled platform model version=0.82.0 server=https://jeff.app.dev.k2.holos.run:443 platform_id=018fa1cf-a609-7463-aa6e-fa53bfded1dc
4:26PM INF pull.go:75 saved platform config version=0.82.0 server=https://jeff.app.dev.k2.holos.run:443 platform_id=018fa1cf-a609-7463-aa6e-fa53bfded1dc path=platform.config.json
❯ (cd components && holos generate component cue namespaces)
4:26PM INF component.go:147 generated component version=0.82.0 name=namespaces path=/home/jeff/workspace/holos-run/holos-infra/saas/components/namespaces
❯ holos render platform ./platform/
4:26PM INF platform.go:29 ok render component version=0.82.0 path=components/namespaces cluster=management num=1 total=2 duration=464.055541ms
4:26PM INF platform.go:29 ok render component version=0.82.0 path=components/namespaces cluster=aws1 num=2 total=2 duration=467.978499ms
```
The result:
```sh
cat deploy/clusters/management/components/namespaces/namespaces.gen.yaml
```
```yaml
---
metadata:
name: holos
labels:
kubernetes.io/metadata.name: holos
kind: Namespace
apiVersion: v1
```
Without this patch the
holos.platform.v1alpha1.PlatformService.CreatePlatform doesn't work as
expected. The Platform message is used which incorrectly requires a
client supplied id which is ignored by the server.
This patch allows the creation of a new platform by reusing the update
operation as a mutation that applies to both create and update. Only
modifiable fields are part of the PlatformMutation message.
This patch adds to more example helm chart based components. podinfo
installs as a normal https repository based helm chart. podinfo-oci
uses an oci image to manage the helm chart.
The way holos handls OCI images is subtle, so it's good to include an
example right out of the chute. Github actions uses OCI images for
example.
This patch adds a command to generate CUE based holos components from
examples embedded in the executable. The examples are passed through
the go template rendering engine with values pulled from flags.
Each directory in the embedded filesystem becomes a unique command for
nice tab completion. The `--name` flag defaults to "example" and is the
resulting component name.
A follow up patch with more flags will set the stage for a Helm
component schematic.
```
holos generate component cue minimal
```
```txt
3:07PM INF component.go:91 generated component version=0.80.2 name=example path=/home/jeff/holos/dev/bare/components/example
```
Split holos render into component and platform.
This patch splits the previous `holos render` command into subcommands.
`holos render component ./path/to/component/` behaves as the previous
`holos render` command and renders an individual component.
The new `holos render platform ./path/to/platform/` subcommand makes
space to render the entire platform using the platform model pulled from
the PlatformService.
Starting with an empty directory:
```sh
holos register user
holos generate platform bare
holos pull platform config .
holos render platform ./platform/
```
```txt
10:01AM INF platform.go:29 ok render component version=0.80.2 path=components/configmap cluster=k1 num=1 total=1 duration=448.133038ms
```
The bare platform has a single component which refers to the platform
model pulled from the PlatformService:
```sh
cat deploy/clusters/mycluster/components/platform-configmap/platform-configmap.gen.yaml
```
```yaml
---
kind: ConfigMap
apiVersion: v1
metadata:
name: platform
namespace: default
data:
platform: |
spec:
model:
cloud:
providers:
- cloudflare
cloudflare:
email: platform@openinfrastructure.co
org:
displayName: Open Infrastructure Services
name: ois
```
This patch adds a subcommand to pull the data necessary to construct a
PlatformConfig DTO. The PlatformConfig message contains all of the
fields and values necessary to build a platform and the platform
components. This is an alternative to holos passing multiple tags to
CUE. The PlatformConfig is marshalled and passed once.
The platform config is also stored in the local filesystem in the root
directory of the platform. This enables repeated local building and
rendering without making an rpc call.
The build / render pipeline is expected to cache the PlatformConfig once
at the start of the pipeline using the pull subcommand.
The `holos render platform` command is unimplemented. This patch
partially implements platform rendering by fetching the platform model
from the PlatformService and providing it to CUE using a tag.
CUE returns a `kind: Platform` resource to `holos` which will eventually
process a Buildlan for each platform component listed in the Platform
spec.
For now, however, it's sufficient to have the current platform model
available to CUE.
Problem:
Rendering the whole platform doesn't need a cluster name.
Solution:
Make the flag optional, do not set the cue tag if it's empty.
Result:
Holos renders the platform resource and proceeds to the point where we
need to implement the iteration over platform components, passing the
platform model to each one and rendering the component.
We need to output a kind: Platform resource from cue so holos can
iterate over each build plan. The platform resource itself should also
contain a copy of the platform model obtained from the PlatformService
so holos can easily pass the model to each BuildPlan it needs to execute
to render the full platform.
This patch lays the groundwork for the Platform resource. A future
patch will have the holos cli obtain the platform model and inject it as
a JSON encoded string to CUE. CUE will return the Platform resource
which is a list of references to build plans. Holos will then iterate
over each build plan, pass the model back in, and execute the build
plan.
To illustrate where we're headed, the `cue export` step will move into
`holos` with a future patch.
```
❯ holos register user
3:34PM INF register.go:77 user version=0.80.0 email=jeff@ois.run server=https://app.dev.k2.holos.run:443 user_id=018f8839-3d74-7e39-afe9-181ad2fc8abe org_id=018f8839-3d74-7e3a-918c-b36494da0115
❯ holos generate platform bare
3:34PM INF generate.go:79 wrote platform.metadata.json version=0.80.0 platform_id=018f8839-3d74-7e3b-8cb8-77a2c124d173 path=/home/jeff/holos/dev/bare/platform.metadata.json
3:34PM INF generate.go:91 generated platform bare version=0.80.0 platform_id=018f8839-3d74-7e3b-8cb8-77a2c124d173 path=/home/jeff/holos/dev/bare
❯ holos push platform form .
3:34PM INF push.go:70 pushed: https://app.dev.k2.holos.run:443/ui/platform/018f8839-3d74-7e3b-8cb8-77a2c124d173 version=0.80.0
❯ cue export ./platform/
{
"metadata": {
"name": "bare",
"labels": {},
"annotations": {}
},
"spec": {
"model": {}
},
"kind": "Platform",
"apiVersion": "holos.run/v1alpha1"
}
```
When the holos server URL switches, we also need to update the client
context to get the correct org id.
Also improve quality of life by printing the url to the form when the
platform form is pushed to the server.
❯ holos push platform form .
11:41AM INF push.go:71 updated platform form version=0.79.0 server=https://app.dev.k2.holos.run:443 platform_id=018f87d1-7ca2-7e37-97ed-a06bcee9b442
11:41AM INF push.go:72 https://app.dev.k2.holos.run:443/ui/platform/018f87d1-7ca2-7e37-97ed-a06bcee9b442 version=0.79.0
This sub-command renders the web app form from CUE code and updates the
form using the `holos.platform.v1alpha1.PlatformService/UpdatePlatform`
rpc method.
Example use case, starting fresh:
```
rm -rf ~/holos
mkdir ~/holos
cd ~/holos
```
Step 1: Login
```sh
holos login
```
```txt
9:53AM INF login.go:40 logged in as jeff@ois.run version=0.79.0 name="Jeff McCune" exp="2024-05-17 21:16:07 -0700 PDT" email=jeff@ois.run
```
Step 2: Register to create server side resources.
```sh
holos register user
```
```
9:52AM INF register.go:68 user version=0.79.0 email=jeff@ois.run user_id=018f826d-85a8-751d-81ee-64d0f2775b3f org_id=018f826d-85a8-751e-98dd-a6cddd9dd8f0
```
Step 3: Generate the bare platform in the local filesystem.
```sh
holos generate platform bare
```
```txt
9:52AM INF generate.go:79 wrote platform.metadata.json version=0.79.0 platform_id=018f826d-85a8-751f-96d0-0d2bf70df909 path=/home/jeff/holos/platform.metadata.json
9:52AM INF generate.go:91 generated platform bare version=0.79.0 platform_id=018f826d-85a8-751f-96d0-0d2bf70df909 path=/home/jeff/holos
```
Step 4: Push the platform form to the `holos server` web app.
```sh
holos push platform form .
```
```txt
9:52AM INF client.go:67 updated platform version=0.79.0 platform_id=018f826d-85a8-751f-96d0-0d2bf70df909 duration=73.62995ms
```
At this point the platform form is published and functions as expected
when visiting the platform web interface.
Makes it easier to work with grpcurl:
grpcurl -H "x-oidc-id-token: $(holos token)" -d '{"org_id":"'$(holos orgid)'"}' ${HOLOS_SERVER##*/} holos.platform.v1alpha1.PlatformService.ListPlatforms
When the user generates a platform, we need to know the platform ID it's
linked to in the holos server. If there is no platform with the same
name, the `holos generate platform` command should error out.
This is necessary because the first thing we want to show is pushing an
updated form to `holos server`. To update the web ui the CLI needs to
know the platform ID to update.
This patch modifies the generate command to obtain a list of platforms
for the org and verify the generated name matches one of the platforms
that already exists.
A future patch could have the `generate platform` command call the
`holos.platform.v1alpha1.PlatformService.CreatePlatform` method if the
platform isn't found.
Results:
```sh
holos generate platform bare
```
```txt
4:15PM INF generate.go:77 wrote platform.metadata.json version=0.77.1 platform_id=018f826d-85a8-751f-96d0-0d2bf70df909 path=/home/jeff/holos/platform.metadata.json
4:15PM INF generate.go:89 generated platform bare version=0.77.1 platform_id=018f826d-85a8-751f-96d0-0d2bf70df909 path=/home/jeff/holos
```
```sh
cat platform.metadata.json
```
```json
{
"id": "018f826d-85a8-751f-96d0-0d2bf70df909",
"name": "bare",
"display_name": "Bare Platform"
}
```
This patch logs the service and rpc method of every request at Info
level. The error code and message is also logged. This gives a good
indication of what rpc methods are being called and by whom.
This patch adds a `holos register user` command. Given an authenticated
id token and no other record of the user in the database, the cli tool
use the API to:
1. User is registered in `holos server`
2. User is linked to one Holos Organization.
3. Holos Organization has the `bare` platform.
4. Holos Organization has the `reference` platform.
5. Ensure `~/.holos/client-context.json` contains the user id and an
org id.
The `holos.ClientContext` struct is intended as a light weight way to
save and load the current organization id to the file system for further
API calls.
The assumption is most users will have only one single org. We can add
a more complicated config context system like kubectl uses if and when
we need it.
This patch adds a generate subcommand that copies a platform embedded
into the executable to the local filesystem. The purpose is to
accelerate initial setup with canned example platforms.
Two platforms are intended to start, one bare and one reference
platform. The number of platforms embedded into holos should be kept
small (2-3) to limit our support burden.
This patch adds the GetVersion rpc method to
holos.system.v1alpha1.SystemService and wires the version information up
to the Web UI.
This is a good example to crib from later regarding fetching and
refreshing data from the web ui using grpc and field masks.
This patch refactors the API following the [API Best Practices][api]
documentation. The UpdatePlatform method is modeled after a mutating
operation described [by Netflix][nflx] instead of using a REST resource
representation. This makes it much easier to iterate over the fields
that need to be updated as the PlatformUpdateOperation is a flat data
structure while a Platform resource may have nested fields. Nested
fields are more complicated and less clear to handle with a FieldMask.
This patch also adds a snapckbar message on save. Previously, the save
button didn't give any indication of success or failure. This patch
fixes the problem by adding a snackbar message that pop up at the bottom
of the screen nicely.
When the snackbar message is dismissed or times out the save button is
re-enabled.
[api]: https://protobuf.dev/programming-guides/api/
[nflx]: https://netflixtechblog.com/practical-api-design-at-netflix-part-2-protobuf-fieldmask-for-mutation-operations-2e75e1d230e4
Examples:
FieldMask for ListPlatforms
```
grpcurl -H "x-oidc-id-token: $(holos token)" -d @ ${HOLOS_SERVER##*/} holos.platform.v1alpha1.PlatformService.ListPlatforms <<EOF
{
"org_id": "018f36fb-e3f7-7f7f-a1c5-c85fb735d215",
"field_mask": { "paths": ["id","name"] }
}
EOF
```
```json
{
"platforms": [
{
"id": "018f36fb-e3ff-7f7f-a5d1-7ca2bf499e94",
"name": "bare"
},
{
"id": "018f6b06-9e57-7223-91a9-784e145d998c",
"name": "gary"
},
{
"id": "018f6b06-9e53-7223-8ae1-1ad53d46b158",
"name": "jeff"
},
{
"id": "018f6b06-9e5b-7223-8b8b-ea62618e8200",
"name": "nate"
}
]
}
```
Closes: #171
This patch refactors the API to be resource-oriented around one service
per resource type. PlatformService, OrganizationService, UserService,
etc...
Validation is improved to use CEL rules provided by [protovalidate][1].
Place holders for FieldMask and other best practices are added, but are
unimplemented as per [API Best Practices][2].
The intent is to set us up well for copying and pasting solid existing
examples as we add features.
With this patch the server and web app client are both updated to use
the refactored API, however the following are not working:
1. Update the model.
2. Field Masks.
[1]: https://buf.build/bufbuild/protovalidate
[2]: https://protobuf.dev/programming-guides/api/
This command is just a prototype of how to fetch the platform model so
we can make it available to CUE.
The idea is we take the data from the holos server and write it into a
CUE `_Platform` struct. This will probably involve converting the data
to CUE format and nesting it under the platform struct spec field.
This patch restructures the bare platform in preparation for a
`Platform` kind of output from CUE in addition to the existing
`BuildPlan` kind.
This patch establishes a pattern where our own CUE defined code goes
into the two CUE module paths:
1. `internal/platforms/cue.mod/gen/github.com/holos-run/holos/api/v1alpha1`
2. `internal/platforms/cue.mod/pkg/github.com/holos-run/holos/api/v1alpha1`
3. `internal/platforms/cue.mod/usr/github.com/holos-run/holos/api/v1alpha1`
The first path is automatically generated from Go structs. The second
path is where we override and provide additional cue level integration.
The third path is reserved for the end user to further refine and
constrain our definitions.
This form goes a good way toward capturing what we need to configure the
entire reference platform. Elements and sections are responsive to
which cloud providers are selected, which achieves my goal of modeling a
reasonably advanced form using only JSON data produced by CUE.
To write the form via the API:
cue export ./forms/platform/ --out json \
| jq '{platform_id: "'${platformId}'", fields: .spec.fields}' \
| grpcurl -H "x-oidc-id-token: $(holos token)" -d @ ${host}:443 \
holos.platform.v1alpha1.PlatformService.PutForm
The way we were organizing fields into section broke Formly validation.
This patch fixes the problem by using the recommended approach of
[Nested Forms][1].
This patch also refactors the PlatformService API to clean it up.
GetForm / PutForm are separated from the Platform methods. Similarly
GetModel / PutModel are separated out and are specific to get and put
the model data.
NOTE: I'm not sure we should have separated out the platform service
into it's own protobuf package. Seems maybe unnecessary.
❯ grpcurl -H "x-oidc-id-token: $(holos token)" -d '{"platform_id":"018f36fb-e3ff-7f7f-a5d1-7ca2bf499e94"}' jeff.app.dev.k2.holos.run:443 holos.platform.v1alpha1.PlatformService.GetModel
{
"model": {
"org": {
"contactEmail": "platform@openinfrastructure.co",
"displayName": "Open Infrastructure Services LLC",
"domain": "ois.run",
"name": "ois"
},
"privacy": {
"country": "earth",
"regions": [
"us-east-2",
"us-west-2"
]
},
"terms": {
"didAgree": true
}
}
}
[1]: https://formly.dev/docs/examples/other/nested-formly-forms
This patch wires up a Select and a Multi Select box. This patch also
establishes a decision as it relates to Formly TypeScript / gRPC Proto3
/ CUE definitions of the form data structure. The decision is to use
gRPC as a transport for any JSON to avoid friction trying to fit Formly
types into Proto3 messages.
Note when using google.protobuf.Value messages with bufbuild/connect-es,
we need to round trip them one last time through JSON to get the
original JSON on the other side. This is because connect-es preserves
the type discriminators in the case and value fields of the message.
Refer to: [Accessing oneof
groups](https://github.com/bufbuild/protobuf-es/blob/main/docs/runtime_api.md#accessing-oneof-groups)
NOTE: On the wire, carry any JSON as field configs for expedience. I
attempted to reflect FormlyFieldConfig in protobuf, but it was too time
consuming. The loosely defined Formly json data API creates significant
friction when joined with a well defined protobuf API. Therefore, we do
not specify anything about the Forms API, convey any valid JSON, and
leave it up to CUE and Formly on the sending and receiving side of the
API.
We use CUE to define our own holos form elements as a subset of the loose
Formly definitions. We further hope Formly will move toward a better JSON
data API, but it's unlikely. Consider replacing Formly entirely and
building on top of the strongly typed Angular Dyanmic Forms API.
Refer to: https://github.com/ngx-formly/ngx-formly/blob/v6.3.0/src/core/src/lib/models/fieldconfig.ts#L15
Consider: https://angular.io/guide/dynamic-form
Usage:
Generate the form from CUE
cue export ./forms/platform/ --out json | jq -cM | pbcopy
Store the form JSON in the config_values column of the platforms table.
View the form, and submit some data. Then get the data back out for use rendering the platform:
grpcurl -H "x-oidc-id-token: $(holos token)" -d '{"platform_id":"'${platformId}'"}' $holos holos.v1alpha1.PlatformService.GetConfig
```json
{
"platform": {
"spec": {
"config": {
"user": {
"sections": {
"org": {
"fields": {
"contactEmail": "jeff@openinfrastructure.co",
"displayName": "Open Infrastructure Services LLC",
"domain": "ois.run",
"name": "ois"
}
},
"privacy": {
"fields": {
"country": "earth",
"regions": [
"us-east-2",
"us-west-2"
]
}
},
"terms": {
"fields": {
"didAgree": true
}
}
}
}
}
}
}
}
```
Problem:
The GetConfig response value isn't directly usable with CUE without some
gymnastics.
Solution:
Refactor the protobuf definition and response output to make the user
defined and supplied config values provided by the API directly usable
in the CUE code that defines the platform.
Result:
The top level platform config is directly usable in the
`internal/platforms/bare` directory:
grpcurl -H "x-oidc-id-token: $(holos token)" -d '{"platform_id":"'${platformID}'"}' $host \
holos.v1alpha1.PlatformService.GetConfig \
> platform.holos.json
Vet the user supplied data:
cue vet ./ -d '#PlatformConfig' platform.holos.json
Build the holos component. The ConfigMap consumes the user supplied
data:
cue export --out yaml -t cluster=k2 ./components/configmap platform.holos.json \
| yq .spec.components
Note the data provided by the input form is embedded into the
ConfigMap managed by Holos:
```yaml
KubernetesObjectsList:
- metadata:
name: platform-configmap
apiObjectMap:
ConfigMap:
platform: |
metadata:
name: platform
namespace: default
labels:
app.holos.run/managed: "true"
data:
platform: |
kind: Platform
spec:
config:
user:
sections:
org:
fields:
contactEmail: jeff@openinfrastructure.co
displayName: Open Infrastructure Services LLC
domain: ois.run
name: ois
apiVersion: app.holos.run/v1alpha1
metadata:
name: bare
labels: {}
annotations: {}
holos:
flags:
cluster: k2
kind: ConfigMap
apiVersion: v1
Skip: false
```
Problem:
The use of google.protobuf.Any was making it awkward to work with the
data provided by the user. The structure of the form data is defined by
the platform engineer, so the intent of Any was to wrap the data in a
way we can pass over the network and persist in the database.
The escaped JSON encoding was problematic and error prone to decode on
the other end.
Solution:
Define the Platform values as a two level map with string keys, but with
protobuf message fields "sections" and "fields" respectively. Use
google.protobuf.Value from the struct package to encode the actual
value.
Result:
In TypeScript, google.protobuf.Value encodes and decodes easily to a
JSON value. On the go side, connect correctly handles the value as
well.
No more ugly error prone escaping:
```
❯ grpcurl -H "x-oidc-id-token: $(holos token)" -d '{"platform_id":"'${platformId}'"}' $host holos.v1alpha1.PlatformService.GetConfig
{
"sections": {
"org": {
"fields": {
"contactEmail": "jeff@openinfrastructure.co",
"displayName": "Open Infrastructure Services LLC",
"domain": "ois.run",
"name": "ois"
}
}
}
}
```
This return value is intended to be directly usable in the CUE code, so
we may further nest the values into a platform.spec key.
This patch changes the backend to store the platform config form
definition and the config values supplied by the form as JSON in the
database.
The gRPC API does not change with this patch, but may need to depending
on how this works and how easy it is to evolve the data model and add
features.
This patch is a work in progress wiring up the form to put the values to
the holos server using grpc.
In an effort to simplify the platform configuration, the structure is a
two level map with the top level being configuration sections and the
second level being the fields associated with the config section.
To support multiple kinds of values and field controls, the values are
serialized to JSON for rpc over the network and for storage in the
database. When they values are used, either by the UI or by the `holos
render` command, they're to be unmarshalled and in-lined into the
Platform Config data structure.
Pick back up ensuring the Platform rpc handler correctly encodes and
decodes the structure to the database.
Consider changing the config_form and config_values fields to JSON field
types in the database. It will likely make working with this a lot
easier.
With this patch we're ready to wire up the holos render command to fetch
the platform configuration and create the end to end demo.
Here's essentially what the render command will fetch and lay down as a
json file for CUE:
```
❯ grpcurl -H "x-oidc-id-token: $(holos token)" -d '{"platform_id":"018f2c4e-ecde-7bcb-8b89-27a99e6cc7a1"}' jeff.app.dev.k2.holos.run:443 holos.v1alpha1.PlatformService.GetPlatform | jq .platform.config.values
{
"sections": {
"org": {
"values": {
"contactEmail": "\"platform@openinfrastructure.co\"",
"displayName": "\"Open Infrastructure Services LLC\"",
"domain": "\"ois.run\"",
"name": "\"ois\""
}
}
}
}
```
This patch adds a /platform/:id route path to a PlatformDetail
component. The platform detail component calls the GetPlatform method
given the platform ID and renders the platform config form on the detail
tab.
The submit button is not yet wired up.
The API for adding platforms changes, allowing raw json bytes using the
RawConfig. The raw bytes are not presented on the read path though,
calling GetPlatforms provides the platform and the config form inline in
the response.
Use the `raw_config` field instead of `config` when creating the form
data.
```
❯ grpcurl -H "x-oidc-id-token: $(holos token)" -d @ jeff.app.dev.k2.holos.run:443 holos.v1alpha1.PlatformService.AddPlatform <<EOF
{
"platform": {
"org_id": "018f27cd-e5ac-7f98-bfe1-2dbab208a48c",
"name": "bare2",
"raw_config": {
"form": "$(cue export ./forms/platform/ --out json | jq -cM | base64 -w0)"
}
}
}
EOF
```
This patch adds 4 fields to the Platform table:
1. Config Form represents the JSON FormlyFieldConfig for the UI.
2. Config CUE represents the CUE file containing a definition the
Config Values must unify with.
3. Config Definition is the CUE definition variable name used to unify
the values with the cue code. Should be #PlatformSpec in most
cases.
4. Config Values represents the JSON values provided by the UI.
The use case is the platform engineer defines the #PlatformSpec in cue,
and provides the form field config. The platform engineer then provides
1-3 above when adding or updating a Platform.
The UI then presents the form to the end user and provides values for 4
when the user submits the form.
This patch also refactors the AddPlatform method to accept a Platform
message. To do so we make the id field optional since it is server
assigned.
The patch also adds a database constraint to ensure platform names are
unique within the scope of an organization.
Results:
Note how the CUE representation of the Platform Form is exported to JSON
then converted to a base64 encoded string, which is the protobuf JSON
representation of a bytes[] value.
```
grpcurl -H "x-oidc-id-token: $(holos token)" -d @ jeff.app.dev.k2.holos.run:443 holos.v1alpha1.PlatformService.AddPlatform <<EOF
{
"platform": {
"id": "0d3dc0c0-bbc8-41f8-8c6e-75f0476509d6",
"org_id": "018f27cd-e5ac-7f98-bfe1-2dbab208a48c",
"name": "bare",
"config": {
"form": "$(cd internal/platforms/bare && cue export ./forms/platform/ --out json | jq -cM | base64 -w0)"
}
}
}
EOF
```
Note the requested platform ID is ignored.
```
{
"platforms": [
{
"id": "018f2af9-f7ba-772a-9db6-f985ece8fed1",
"timestamps": {
"createdAt": "2024-04-29T17:49:36.058379Z",
"updatedAt": "2024-04-29T17:49:36.058379Z"
},
"name": "bare",
"creator": {
"id": "018f27cd-e591-7f98-a9d2-416167282d37"
},
"config": {
"form": "eyJhcGlWZXJzaW9uIjoiZm9ybXMuaG9sb3MucnVuL3YxYWxwaGExIiwia2luZCI6IlBsYXRmb3JtRm9ybSIsIm1ldGFkYXRhIjp7Im5hbWUiOiJiYXJlIn0sInNwZWMiOnsic2VjdGlvbnMiOlt7Im5hbWUiOiJvcmciLCJkaXNwbGF5TmFtZSI6Ik9yZ2FuaXphdGlvbiIsImRlc2NyaXB0aW9uIjoiT3JnYW5pemF0aW9uIGNvbmZpZyB2YWx1ZXMgYXJlIHVzZWQgdG8gZGVyaXZlIG1vcmUgc3BlY2lmaWMgY29uZmlndXJhdGlvbiB2YWx1ZXMgdGhyb3VnaG91dCB0aGUgcGxhdGZvcm0uIiwiZmllbGRDb25maWdzIjpbeyJrZXkiOiJuYW1lIiwidHlwZSI6ImlucHV0IiwicHJvcHMiOnsibGFiZWwiOiJOYW1lIiwicGxhY2Vob2xkZXIiOiJleGFtcGxlIiwiZGVzY3JpcHRpb24iOiJETlMgbGFiZWwsIGUuZy4gJ2V4YW1wbGUnIiwicmVxdWlyZWQiOnRydWV9fSx7ImtleSI6ImRvbWFpbiIsInR5cGUiOiJpbnB1dCIsInByb3BzIjp7ImxhYmVsIjoiRG9tYWluIiwicGxhY2Vob2xkZXIiOiJleGFtcGxlLmNvbSIsImRlc2NyaXB0aW9uIjoiRE5TIGRvbWFpbiwgZS5nLiAnZXhhbXBsZS5jb20nIiwicmVxdWlyZWQiOnRydWV9fSx7ImtleSI6ImRpc3BsYXlOYW1lIiwidHlwZSI6ImlucHV0IiwicHJvcHMiOnsibGFiZWwiOiJEaXNwbGF5IE5hbWUiLCJwbGFjZWhvbGRlciI6IkV4YW1wbGUgT3JnYW5pemF0aW9uIiwiZGVzY3JpcHRpb24iOiJEaXNwbGF5IG5hbWUsIGUuZy4gJ0V4YW1wbGUgT3JnYW5pemF0aW9uJyIsInJlcXVpcmVkIjp0cnVlfX0seyJrZXkiOiJjb250YWN0RW1haWwiLCJ0eXBlIjoiaW5wdXQiLCJwcm9wcyI6eyJsYWJlbCI6IkNvbnRhY3QgRW1haWwiLCJwbGFjZWhvbGRlciI6InBsYXRmb3JtLXRlYW1AZXhhbXBsZS5jb20iLCJkZXNjcmlwdGlvbiI6IlRlY2huaWNhbCBjb250YWN0IGVtYWlsIGFkZHJlc3MiLCJyZXF1aXJlZCI6dHJ1ZX19XX1dfX0K"
}
}
]
}
```
This patch adds a basic AddPlatform method that adds a platform with a
name and a display name.
Next steps are to add fields for the Platform Config Form definition and
the Platform Config values submitted from the form.
Next step: AddPlatform
Also consider extracting the queries to get the requested org_id to a
helper function. This will likely eventually move to an interceptor
because every request is org scoped and needs authorization checks
against the org.
```
grpcurl -H "x-oidc-id-token: $(holos token)" -d '{"org_id":"018f27cd-e5ac-7f98-bfe1-2dbab208a48c"}' jeff.app.dev.k2.holos.run:443 holos.v1alpha1.PlatformService.GetPlatforms
```
Problem:
Platform engineers need the ability to define custom input fields for
their own platform level configuration values. The holos web UI needs
to present the platform config values in a clean way. The values
entered on the form need to make their way into the top level
Platform.spec field for use across all components and clusters in the
platform.
Solution:
Define a Platform Form in a forms cue package. The output of this
definition is intended to be sent to the holos server to provide to the
web UI.
Result:
Platform engineers can define their platform config input values in
their infrastructure repository. For example, the bare platform form
inputs are defined at `platforms/bare/forms/platform/platform-form.cue`.
This cue file produces [FormlyFieldConfig][1] output.
```console
cue export ./forms/platform/ --out yaml
```
```yaml
apiVersion: forms.holos.run/v1alpha1
kind: PlatformForm
metadata:
name: bare
spec:
sections:
- name: org
displayName: Organization
description: Organization config values are used to derive more specific configuration values throughout the platform.
fieldConfigs:
- key: name
type: input
props:
label: Name
placeholder: example
description: DNS label, e.g. 'example'
required: true
- key: domain
type: input
props:
label: Domain
placeholder: example.com
description: DNS domain, e.g. 'example.com'
required: true
- key: displayName
type: input
props:
label: Display Name
placeholder: Example Organization
description: Display name, e.g. 'Example Organization'
required: true
- key: contactEmail
type: input
props:
label: Contact Email
placeholder: platform-team@example.com
description: Technical contact email address
required: true
```
Next Steps:
Add a holos subcommand to produce the output and store it in the
backend. Wire the front end to fetch the form config from the backend.
[1]: https://formly.dev/docs/api/core#formlyfieldconfig
This patch adds a bare platform that does nothing but render a configmap
containing the platform config structure itself.
The definition of the platform structure is firming up. The platform
designer, which may be a holos customer, is responsible for defining the
structure of the `platform.spec` output field.
Us holos developers have a reserved namespace to add configuration
fields and data in the `platform.holos` output file.
Beyond these two fields, the platform config structure has TypeMeta and
ObjectMeta fields similar to a kubernetes api object to support
versioning the platform config data, naming the platform, annotating the
platform, and labeling the platform.
The path forward from here is to:
1. Eventually move the stable definitions into a CUE module that gets
imported into the user's package.
2. As a platform designer, add the organization field to the
#PlatformSpec definition as a CUE definition.
3. As a platform designer, add the organization field Form data
structure as a JSON file.
4. Add an API to upload the #PlatformSpec cue file and the
#PlatformSpec form json file to the saas backend.
5. Wire up Angular to pull the form json from the API and render the
form.
6. Wire up Angular to write the form data to a gRPC service method.
7. Wire up the `holos cli` to read the form data from a gRPC service
method.
8. Tie it all together where the holos cli renders the configmap.
This patch adds an organization "selector" that's really just a place
holder. The active organization is the last element in the list
returned by the GetCallerOrganizations method for now.
The purpose is to make sure we have the structure in place for more than
one organizations without needing to implement full support for the
feature at this early stage.
The Angular frontend is expected to call the activeOrg() method of the
OrganizationService. In the future this could store the state of which
organization the user has selected. The purpose is to return an org id
to send as a request parameter for other requests.
Note this patch also implements refresh behavior. The list of orgs is
fetched once on application load. If there is no user, or the user has
zero orgs, the user is created and an organization is added with them as
an owner. This is accompished using observable pipes.
The pipe is tied to a refresh behavior. Clicking the org button
triggers the refresh behavior, which executes the pipe again and
notifies all subscribers.
This works quite well and should be idiomatic angular / rxjs. Clicking
the button automatically updates the UI after making the necessary API
calls.
This patch adds the OrganizationService to the Angular front end and
displays a simple list of the organizations the user is a member of in
the profile card.
There isn't a service yet to return the currently selected
organization, but that could be a simple method to return the most
recent entry in the list until we put something more complicated in
place like local storage of what the user has selected.
It may make sense to put a database constraint on the number of
organizations until we implement the feature later, it's too early to do
so now, I just want to make sure it's possible to add later.
Problem:
When loading the page the GetCallerClaims rpc method is called multiple
times unnecessarily.
Solution:
Use [shareReplay][1] to replay the last observable event for all
subscribers, including subscribers coming late to the party.
Result:
Network inspector in chrome indicates GetCallerClaims is called once and
only once.
[1]: https://rxjs.dev/api/operators/shareReplay
This patch adds a ProfileButton component which makes a ConnectRPC gRPC
call to the `holos.v1alpha1.UserService.GetCallerClaims` method and
renders the profile button based on the claims.
Note, in the network inspector there are two API calls to
`holos.v1alpha1.UserService.GetCallerClaims` which is unfortunate. A
follow up patch might be good to fix this.
Problem:
It's slow to build the angular app, compile it into the go executable,
copy it to the pod, then restart the server.
Solution:
Configure the mesh to route /ui to `ng serve` running on my local
host.
Result:
Navigating to https://jeff.app.dev.k2.holos.run/ui gets responses from
the ng development server.
Use:
ng serve --host 0.0.0.0
This patch simplifies the user and organization registration and query
for the UI. The pattern clients are expected to follow is to create if
the get fails. For example, the following pseudo-go-code is the
expected calling convention:
var entity *ent.User
entity, err := Get()
if err != nil {
if ent.MaskNotFound(err) == nil {
entity = Create()
} else {
return err
}
}
return entity
This patch adds the following service methods. For initial
registration, all input data comes from the id token claims of the
authenticated user.
```
❯ grpcurl -H "x-oidc-id-token: $(holos token)" jeff.app.dev.k2.holos.run:443 list | xargs -n1 grpcurl -H "x-oidc-id-token: $(holos token)" jeff.app.dev.k2.holos.run:443 list
holos.v1alpha1.OrganizationService.CreateCallerOrganization
holos.v1alpha1.OrganizationService.GetCallerOrganizations
holos.v1alpha1.UserService.CreateCallerUser
holos.v1alpha1.UserService.GetCallerClaims
holos.v1alpha1.UserService.GetCallerUser
```
The server will frequently look up the user record given the iss and sub
claims from the id token, index them and make sure the combination of
the two is unique.
The `make-provisioner-jwt` incorrectly used the choria broker password
as the provisioning token. In the reference [setup.sh][1] both the
token and the `broker_provisioning_password` are set to `s3cret` so I
confused the two, but they are actually different values.
This patch ensures the provisioning token configured in
`provisioner.yaml` matches the token embedded into the provisioning.jwt
file using `choria jwt provisioning` via the `make-provisioner-jwt`
script.
[1]: 6dbc8fd105/example/setup/templates/provisioner/provisioner.yaml (L6)
Problem:
When the ingress default Gateway AuthorizationPolicy/authpolicy-custom
rule is in place the choria machine room holos controller fails to
connect to the provisioner broker with the following error:
```
❯ holos controller run --config=agent.cfg
WARN[0000] Starting controller version 0.68.1 with config file /home/jeff/workspace/holos-run/holos/hack/choria/agent/agent.cfg leader=false
WARN[0000] Switching to provisioning configuration due to build defaults and missing /home/jeff/workspace/holos-run/holos/hack/choria/agent/agent.cfg
WARN[0000] Setting anonymous TLS mode during provisioning component=server connection=coffee.home identity=coffee.home
WARN[0000] Initial connection to the Broker failed on try 1: invalid websocket connection component=server connection=coffee.home identity=coffee.home
WARN[0000] Initial connection to the Broker failed on try 2: invalid websocket connection component=server connection=coffee.home identity=coffee.home
WARN[0002] Initial connection to the Broker failed on try 3: invalid websocket connection component=server connection=coffee.home identity=coffee.home
```
This problem is caused because the provisioning token url is set to
`wss://jeff.provision.dev.k2.holos.run:443` which has the port number
specified.
Solution:
Follow the upstream istio guidance of [Writing Host Match Policies][1]
to match host headers with or without the port specified.
Result:
The controller is able to connect to the provisioner broker:
[1]: https://istio.io/latest/docs/ops/best-practices/security/#writing-host-match-policies
This problem fixes an error where the istio ingress gateway proxy failed
to verify the TLS certificate presented by the choria broker upstream
server.
kubectl logs choria-broker-0
level=error msg="websocket: TLS handshake error from 10.244.1.190:36142: remote error: tls: unknown certificate\n"
Istio ingress logs:
kubectl -n istio-ingress logs -l app=istio-ingressgateway -f | grep --line-buffered '^{' | jq .
"upstream_transport_failure_reason": "TLS_error:|268435581:SSL_routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED:TLS_error_end:TLS_error_end"
Client curl output:
curl https://jeff.provision.dev.k2.holos.run
upstream connect error or disconnect/reset before headers. retried and the latest reset reason: remote connection failure, transport failure reason: TLS_error:|268435581:SSL routines:OPENSSL_i
nternal:CERTIFICATE_VERIFY_FAILED:TLS_error_end:TLS_error_end
Explanation of error:
Istio defaults to expecting a tls certificate matching the downstream
host/authority which isn't how we've configured Choria.
Refer to [ClientTLSSettings][1]
> A list of alternate names to verify the subject identity in the
> certificate. If specified, the proxy will verify that the server
> certificate’s subject alt name matches one of the specified values. If
> specified, this list overrides the value of subject_alt_names from the
> ServiceEntry. If unspecified, automatic validation of upstream presented
> certificate for new upstream connections will be done based on the
> downstream HTTP host/authority header, provided
> VERIFY_CERTIFICATE_AT_CLIENT and ENABLE_AUTO_SNI environmental variables
> are set to true.
[1]: https://istio.io/latest/docs/reference/config/networking/destination-rule/#ClientTLSSettings
This patch configures ArgoCD to log in via PKCE.
Note the changes are primarily in platform.site.cue and ensuring the
emailDomain is set properly. Note too the redirect URL needs to be
`/pkce/verify` when PKCE is enabled. Finally, if the setting is
reconfigured make sure to clear cookies otherwise the incorrect
`/auth/callback` path may be used.
Problem:
Port names in the default Gateway.spec.servers.port field must be unique
across all servers associated with the workload.
Solution:
Append the fully qualified domain name with dots replaced with hyphens.
Result:
Port name is unique.
Problem:
The default gateway in one cluster gets server entries for all hosts in
the problem. This makes the list unnecessarily large with entries for
clusters that should not be handled on the current cluster.
For example, the k2 cluster has gateway entries to route hosts for k1,
k3, k4, k5, etc...
Solution:
Add a field to the CertInfo definition representing which clusters the
host is valid on.
Result:
Hosts which are valid on all clusters, e.g. login.ois.run, have all
project clusters added to the clusters field of the CertInfo. Hosts
which are valid on a single cluster have the coresponding single entry
added.
When building resources, holos components should check if `#ClusterName`
is a valid field of the CertInfo.clusters field. If so, the host is
valid for the current cluster. If not, the host should be omitted from
the current cluster.
Doing so forces unnecessary hosts for some projects. For example,
iam.ois.run is useless for the iam project, the primary project host is
login to build login.ois.run.
Some projects may not need any hosts as well.
Better to let the user specify `project: foo: hosts: foo: _` if they
want it.
This patch loops over every Gateway.spec.servers entry in the default
gateway and manages an ExternalSecret to sync the credential from the
provisioner cluster.
Problem:
A Holos Component is created for each project stage, but all hosts for
all stages in the project are added. This creates duplicates.
Solution:
Sort project hosts by their stage and map the holos component for a
stage to the hosts for that stage.
Result:
Duplicates are eliminated, the prod certs are not in the dev holos
component and vice-versa.
This patch provisions wildcard certs in the provisioning cluster. The
CN matches the project stage host global hostname without any cluster
qualifiers.
The use of a wildcard in place of the environment name dns segment at
the leftmost position of the fully qualified dns name enables additional
environments to be configured without reissuing certificates.
This is to avoid the 100 name per cert limit in LetsEncrypt.
Mapping each project host fqdn to the stage is unnecessary. The list of
gateway servers is constructed from each FQDN in the project.
This patch removes the unnecessary struct mappings.
Problem:
It's difficult to map and reduce the collection of project hosts when
configuring related Certificate, Gateway.spec.servers, VirtualService,
and auth proxy cookie domain settings.
Solution:
Define #ProjectHosts which takes a project and provides Hosts which is a
struct with a fqdn key and a #CertInfo value. The #CertInfo definition
is intended to provide everything need to reduce the Hosts property to
structs usful for the problematic resources mentioned previously.
Result:
Gateway.spec.servers are mapped using #ProjectHosts
Next step is to map the Certificate resources on the provisioner
cluster.
Problem:
Adding environments to a project causes certs to be re-issued.
Solution:
Enable wildcard certs for per-environment namespaces like jeff, gary,
nate, etc...
Result:
Environments can be added to a project stage without needing the cert to
be re-issued.
This patch avoids LetsEncrypt rate limits by consolidating multiple dns
names into one certificate.
For each project host, create a certificate for each stage in the
project. The certificate contains the dns names for all clusters and
environments associated with that stage and host.
This can become quite a list, the limit is 100 dnsNames.
For the Holos project which has 7 clusters and 4 dev environments, the
number of dns names is 32 (4 envs + 4 envs * 7 clusters = 32 dns names).
Still, a much needed improvement because we're limited to 50 certs per
week.
It may be worth considering wildcards for the per-developer
environments, which are the ones we'll likely spin up the most
frequently.
This patch is a partial step toward getting the choria broker up
and running in my own namespace. The choria broker is necessary for
provisioning machine room agents such as the holos controller.
This patch adds an initial holos controller subcommand. The machine
room agent starts, but doesn't yet provision because we haven't deployed
the provisioning infrastructure yet.
Configure NATS in a 3 Node deployment with resolver authentication using
an Operator JWT.
The operator secret nkeys are stored in the provisioner cluster. Get
them with:
holos get secret -n jeff-holos nats-nsc --print-key nsc.tgz | tar -tvzf-
This patch sets up basic routing and a 404 not found page. The Home and
Clusters page are generated from the [dashboard schematic][1]
ng generate @angular/material:dashboard home
ng generate @angular/material:dashboard cluster-list
ng g c error-not-found
[1]: https://material.angular.io/guide/schematics#dashboard-schematic
Instead of trying to hand-craft a navigation sidebar and toolbar from
Youtube videos, use the [navigation schematic][1] to quickly get a "good
enough" UI.
ng generate @angular/material:navigation nav
[1]: https://material.angular.io/guide/schematics#navigation-schematic
Angular must build output into a path compatible with the Go
http.FileServer. We cannot easily graft an fs.FS onto a sub-path, so we
need the `./ui/` path in the output. This requires special
configuration from the Angular default application builder behavior.
Angular must build output into a path compatible with the Go
http.FileServer. We cannot easily graft an fs.FS onto a sub-path, so we
need the `./ui/` path in the output. This requires special
configuration from the Angular default application builder behavior.
ng add @angular/material
```
❯ ng add @angular/material
Skipping installation: Package already installed
? Choose a prebuilt theme name, or "custom" for a custom theme: Indigo/Pink [ Preview: https://material.angular.io?theme=indigo-pink ]
? Set up global Angular Material typography styles? Yes
? Include the Angular animations module? Include and enable animations Yes
```
And add a logout command that deletes the token cache.
The token package is intended for subcommands that need to make API
calls to the holos api server, getting a token should be a simple matter
of calling the token.Get() method, which takes minimal dependencies.
This copies the login command from the previous holos cli. Wire
dependency injection and all the rest of the unnecessary stuff from
kubelogin are removed, streamlined down into a single function that
takes a few oidc related parameters.
This will need to be extracted out into an infrastructure service so
multiple other command line tools can easily re-use it and get the ID
token into the x-oidc-id-token header.
The upstream nats charts don't specify namespaces for each attribute.
This works with helm update, but not helm template which holos uses to
render the yaml.
The missing namespace causes flux to fail.
This patch uses the flux kustomization to add the target namespace to
all resources.
When rendering a holos component which contains more than one helm chart, rendering fails. It should succeed.
```
holos render --cluster-name=k2 /home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/holos/... --log-level debug
```
```
9:03PM ERR could not execute version=0.64.2 err="could not rename: rename /home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/holos/nats/envs/vendor553679311 /home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/holos/nats/envs/vendor: file exists" loc=helm.go:145
```
This patch fixes the problem by moving each child item of the temporary
directory charts are installed into. This avoids the problem of moving
the parent when the parent target already exists.
Add Tilt back from holos server
Note with this patch the ec-creds.yaml file needs to be applied to the
provisioner and an external secret used to sync the image pull creds.
With this patch the dev instance is accessible behind the auth proxy.
pgAdmin also works from the Tilt UI.
https://jeff.holos.dev.k2.ois.run/app/start
goreleaser fails with Failure: plugin connect-query: could not find protoc plugin for name connect-query - please make sure protoc-gen-connect-query is installed and present on your $PATH
Remove the server.Config struct, not needed. Remove the app struct and
move the configuration to the main holos.Config.ServerConfig.
Add flags specific to server configuration.
With this patch logging is simplified. Subcommands have a handle on the
top level holos.Config and can get a fully configured logger from
cfg.Logger() after flag parsing happens.
Disambiguate the term `core` which should mean the core domain. The app
is a supporting domain concerned with logging and configuration
initialization early in the life cycle.
This runbook documents how to write a full database backup to a blank S3
bucket given an existing postgrescluster resource with a live, running
database.
The pgo controller needs to remove and re-create the repo for the backup
to succeed, otherwise it complains about a missing file expected from a
previous backup.
Without this patch users encounter an error from istio because it does
not have a valid Jwks from ZITADEL to verify the request when processing
a `RequestAuthentication` policy.
Fixes error `AuthProxy JWKS Error - Jwks doesn't have key to match kid or alg from Jwt`.
Occurs when accessing a protected URL for the first time after tokens have expired.
Grafana does not yet have the istio sidecar. Prometheus is accessible
through the auth proxy. Cert manager is added to the workload clusters
so tls certs can be issued for webhooks, the kube-prom-stack helm chart
uses cert manager for this purpose.
With this patch Grafana is integrated with OIDC and I'm able to log in
as an Administrator.
Problem:
The VirtualService that catches auth routes for paths, e.g.
`/holos/authproxy/istio-ingress` is bound to the default gateway which
no longer exists because it has no hosts.
Solution:
It's unnecessary and complicated to create a Gateway for every project.
Instead, put all server entries into one `default` gateway and
consolidate the list using CUE.
Result:
It's easier to reason about this system. There is only one ingress
gateway, `default` and everything gets added to it. VirtualServices
need only bind to this gateway, which has a hosts entry appropriately
namespaced for the project.
Problem:
The ZITADEL database isn't restoring into the prod-iam namespace after
moving from prod-iam-zitadel because no backup exists at the bucket
path.
Solution:
Hard-code the path to the old namespace to restore the database. We'll
figure out how to move the backups to the new location in a follow up
change.
The `prod-platform-gateway` kustomization is reconciling early:
ExternalSecret/istio-ingress/argocd.ois.run dry-run failed: failed to
get API group resources: unable to retrieve the complete list of server
APIs: external-secrets.io/v1beta1: the server could not find the
requested resource
This patch moves ZITADEL from the prod-iam-zitadel namespace to the
projects managed prod-iam namespace, which is the prod environment of
the prod stage of the iam project.
Using the Helm chart so we can inject the istio sidecar with a kustomize
patch and tweak the configs for OIDC integration.
Login works, istio sidecar is injected. ArgoCD can only be configured
with one domain unfortunately, it's not accessible at argocd.ois.run,
only argocd.k2.ois.run (or whatever cluster it's installed into).
Ideally it would use the Host header but it does not.
RBAC is not implemented but the User Info endpoint does have group
membership so this shouldn't be a problem to implement.
This patch defines a #AuthPolicyRules struct which excludes hosts from
the blanket auth policy and includes them in specialized auth policies.
The purpose is to handle special cases like vault requests which have an
`X-Vault-Token` and `X-Vault-Request` header.
Vault does not use jwts so we cannot verify them in the mesh, have to
pass them along to the backend.
Closes: #93
The ingress gateway auth proxy callback conflicts with the project stage
auth proxy callback for the same backend Host: header value.
This patch disambiguates them by the namespace the auth proxy resides
in.
This patch adds a `RequestAuthentication` and `AuthorizationPolicy` rule
to protect all requests flowing through the default ingress gateway.
Consider a browser request for httpbin.k2.example.com representing any
arbitrary host with a valid destination inside the service mesh. The
default ingress gateway will check if there is already an
x-oidc-id-token header, and if so validate the token is issued by
ZITADEL and the aud value contains the ZITADEL project number.
If the header is not present, the request is forwarded to oauth2-proxy
in the istio-ingress namespace. This auth proxy is configured to start
the oidc auth flow with a redirect back to /holos/oidc/callback of the
Host: value originally provided in the browser request.
Closes: #82
This patch adds an ingress gateway extauthz provider. Because ZITADEL
returns all applications associated with a ZITADEL project in the aud
claim, it makes sense to have one ingress auth proxy at the initial
ingress gateway so we can get the ID token in the request header for
backend namespaces to match using `RequestAuthentication` and
`AuthorizationPolicy`.
This change likely makes the additional per-stage auth proxies
unnecessary and over-engineered. Backend namespaces will have access to
the ID token.
It doesn't make sense to link the stage ext authz provider to the
ingress gateway because there can be only one provider per workload.
Link it instead to the backend environment and use the
`security.holos.run/authproxy` label to match the workload.
Problem:
Backend services and web apps expect to place their own credentials into
the Authorization header. oauth2-proxy writes over the authorization
header creating a conflict.
Solution:
Use the alpha configuration to place the id token into the
x-oidc-id-token header and configure the service mesh to authenticate
requests that have this header in place.
Note: ZITADEL does not use a JWT for an access token, unlike Keycloak
and Dex. The access token is not compatible with a
RequestAuthentication jwt rule so we must use the id token.
Without this patch the istio RequestAuthentication resources fail to
match because the access token from ZITADEL returned by oauth2-proxy in
the x-auth-request-access-token header is not a proper jwt.
The error is:
```
Jwt is not in the form of Header.Payload.Signature with two dots and 3 sections
```
This patch works around the problem by configuring oauth2-proxy to set
the ID token, which is guaranteed to be a proper JWT in the
authorization response headers.
Unfortunately, oauth2-proxy will only place the ID token in the
Authorization header response, which will write over any header set by a
client application. This is likely to cause problems with single page
apps.
We'll probably need to work around this issue by using the alpha
configuration to set the id token in some out-of-the-way header. We've
done this before, it'll just take some work to setup the ConfigMap and
translate the config again.
This patch configures an istio envoyExtAuthzHttp provider for each stage
in each project. An example provider for the dev stage of the holos
project is `authproxy-dev-holos`
This patch configures the service mesh to route all requests with a uri
path prefix of `/holos/oidc` to the auth proxy associated with the
project stage.
Consider a request to https://jeff.holos.dev.k2.ois.run/holos/oidc/sign_in
This request is usually routed to the backend app, but
VirtualService/authproxy in the dev-holos-system namespace matches the
request and routes it to the auth proxy instead.
The auth proxy matches the request Host: header against the whitelist
and cookiedomain setting, which matches the suffix
`.holos.dev.k2.ois.run`. The auth proxy redirects to the oidc issuer
with a callback url of the request Host for a url of
`https://jeff.holos.dev.k2.ois.run/holos/oidc/callback`.
ZITADEL matches the callback against those registered with the app and
the app client id. A code is then sent back to the auth proxy.
The auth proxy sets a cookie named `__Secure-authproxy-dev-holos` with a
domain of `.holos.dev.k2.ois.run` from the suffix match of the
`--cookiedomain` flag.
Because this all works using paths, the `auth` prefix domains have been
removed. They're unnecessary, oauth2-proxy is available for any host
routed to the project stage at path prefix `/holos/oidc`.
Refer to https://oauth2-proxy.github.io/oauth2-proxy/features/endpoints/
for good endpoints for debuggin, replacing `/oauth2` with `/holos/oidc`
This patch deploys oauth2-proxy and redis to the system namespace of
each stage in each project. The plan is to redirect unauthenticated
requests to the request host at the /holos/oidc/callback endpoint.
This patch removes the --redirect-uri flag, which makes the auth domain
prefix moot, so a future patch should remove those if they really are
unnecessary.
The reason to remove the --redirect-uri flag is to make sure we set the
cookie to a domain suffix of the request Host: header.
This patch adds entries to the project stage Gateway for oauth2-proxy.
Three entries for each stage are added, one for the global endpoint plus
one for each cluster.
Without this patch the auth proxy cookie domain is difficult to manage.
This patch refactors the hosts managed for each environment in a project
to better align with security domains and auth proxy session cookies.
The convention is: `<env?>.<host>.<stage?>.<cluster?>.<domain>` where
`host` can be 0..N entries with a default value of `[projectName]`.
env may be omitted for prod or the dev env of the dev stage. stage may
be omitted for prod. cluster may be omitted for the global endpoint.
For a project named `holos`:
| Project | Stage | Env | Cluster | Host |
| ------- | ----- | --- | ------- | ------ |
| holos | dev | jeff | k2 | jeff.holos.dev.k2.ois.run |
| holos | dev | jeff | global | jeff.holos.dev.ois.run |
| holos | dev | - | k2 | holos.dev.k2.ois.run |
| holos | dev | - | global | holos.dev.ois.run |
| holos | prod | - | k2 | holos.k2.ois.run |
| holos | prod | - | global | holos.ois.run |
Auth proxy:
| Project | Stage | Auth Proxy Host | Auth Cookie Domain |
| ------- | ----- | ------ | ------------------ |
| holos | dev | auth.holos.dev.ois.run | holos.dev.ois.run |
| holos | dev | auth.holos.dev.k1.ois.run | holos.dev.k1.ois.run |
| holos | dev | auth.holos.dev.k2.ois.run | holos.dev.k2.ois.run |
| holos | prod | auth.holos.ois.run | holos.ois.run |
| holos | prod | auth.holos.k1.ois.run | holos.k1.ois.run |
| holos | prod | auth.holos.k2.ois.run | holos.k2.ois.run |
Prior to this, when running the 'install' or 'build' Makefile target,
the version of holos being built was not shown even though the 'build'
target attempted to show the version.
```
.PHONY: build
build: generate ## Build holos executable.
@echo "building ${BIN_NAME} ${VERSION}"
```
For example:
```
> make install
go generate ./...
building holos
...
```
Holo's version is stored in pkg/version/embedded/{major,minor,patch},
not the `Version` const. So the fix is to change the value of `VERSION`
so that it comes from those embedded files.
Now the version of holos is shown:
```
> make install
go generate ./...
building holos 0.61.1
...
```
This also adds a new Makefile target called `show-version` which shows
the full version string (i.e. the value of `$VERSION`).
The goal of this patch is to verify each project environment is wired up
to the ingress Gateway for the project stage.
This is a necessary step to eventually configure the VirtualService and
AuthorizationPolicy to only match on the `/dump/request` path of each
endpoint for troubleshooting.
This patch uses the existing #ManagedNamespaces definition to create and
manage namespaces on the provisioner and workload clusters so that
SecretStore and eso-creds-refresher resources are managed in the project
environment namespaces and the project stage system namespace.
Provisioner cluster:
This patch creates a Certificate resource in the provisioner for each
host associated with the project. By default, one host is created for
each stage with the short hostname set to the project name.
A namespace is also created for each project for eso creds refresher to
manage service accounts for SecretStore resources in the workload
clusters.
Workload cluster:
For each env, plus one system namespace per stage:
- Namespace per env
- SecretStore per env
- ExternalSecret per host in the env
Common names for the holos project, prod stage:
- holos.k1.ois.run
- holos.k2.ois.run
- holos.ois.run
Common names for the holos project, dev stage:
- holos.dev.k1.ois.run
- holos.dev.k2.ois.run
- holos.dev.ois.run
- holos.gary.k1.ois.run
- holos.gary.k2.ois.run
- holos.gary.ois.run
- holos.jeff.k1.ois.run
- holos.jeff.k2.ois.run
- holos.jeff.ois.run
- holos.nate.k1.ois.run
- holos.nate.k2.ois.run
- holos.nate.ois.run
Usage:
holos render --cluster-name=provisioner \
~/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/provisioner/projects/...
holos render --cluster-name=k1 \
~/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/workload/projects/...
holos render --cluster-name=k2 \
~/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/workload/projects/...
This patch introduces a new BuildPlan spec.components.resources
collection, which is a map version of
spec.components.kubernetesObjectsList. The map version is much easier
to work with and produce in CUE than the list version.
The list version should be deprecated and removed prior to public
release.
The projects holos instance renders multiple holos components, each
containing kubernetes api objects defined directly in CUE.
<project>-system is intended for the ext auth proxy providers for all
stages.
<project>-namespaces is intended to create a namespace for each
environment in the project.
The intent is to expand the platform level definition of a project to
include the per-stage auth proxy and per-env role bindings. Secret
Store and ESO creds refresher resources will also be defined by the
platform level definition of a project.
This patch disallows unknown fields from CUE. The purpose is to fail
early if there is a typo in a nested field name and to speed up
refactoring the reference platform.
With this patch, refactoring the type definition of the Holos/CUE API is
a faster process:
1. Change api/vX/*.go
2. make gencue
3. Render the reference platform
4. Fix error with unknown fields
5. Verify rendered output is the same as before
Closes: #72
This patch establishes the BuildPlan struct as the single API contract
between CUE and Holos. A BuildPlan spec contains a list of each of the
support holos component types.
The purpose of this data structure is to support the use case of one CUE
instance generating 1 build plan that contains 0..N of each type of
holos component.
The need for multiple components per one CUE instance is to support the
generation of a collection of N~4 flux kustomization resources per
project and P~6 projects built from one CUE instance.
Tested with:
holos render --cluster-name=k2 ~/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/foundation/cloud/init/namespaces/...
Common labels are removed because they're too tightly coupled to the
model of one component per one cue instance.
This patch refactors the go structs used to decode cue output for
processing by the holos cli. For context, the purpose of the structs
are to inform holos how the data from cue should be modeled and
processed as a rendering pipeline that provides rendered yaml to
configure kubernetes api objects.
The structs share common fields in the form of the HolosComponent
embedded struct. The three main holos component kinds today are:
1. KubernetesObjects - CUE outputs a nested map where each value is a
single rendered api object (resource).
2. HelmChart - CUE outputs the chart name and values. Holos calls helm
template to render the chart. Additional api objects may be
overlaid into the rendered output. Kustomize may also optionally be
called at the end of the render pipeline.
3. KustomizeBuild - CUE outputs data to construct a kustomize
kustomization build. The holos component contains raw yaml files to
use as kustomization resources. CUE optionally defines additional
patches, common labels, etc.
With the Go structs, cue may directly import the definitions to more
easily keep the CUE definitions in sync with what the holos cli expects
to receive.
The holos component types may be imported into cue using:
cue get go github.com/holos-run/holos/api/v1alpha1/...
Without this patch ks/prod-iam-zitadel often gets blocked waiting for
jobs that will never complete. In addition, flux should not manage the
zitadel-test-connection Pod which is an unnecessary artifact of the
upstream helm chart.
We'd disable helm hooks, but they're necessary to create the init and
setup jobs.
This patch also changes the default behavior of Kustomizations from
wait: true to wait: false. Waiting is expensive for the api server and
slows down the reconciliation process considerably.
Component authors should use ks.spec.healthChecks to target specific
important resources to watch and wait for.
This patch fixes the problem of the actions runner scale set listener
pod failing every 3 seconds. See
https://github.com/actions/actions-runner-controller/issues/3351
The solution is not ideal, if the primary cluster is down workflows will
not execute. The primary cluster shouldn't go down though so this is
the trade off. Lower log spam and resource usage by eliminating the
failing pods on other clusters for lower availability if the primary
cluster is not available.
We could let the pods loop and if the primary is unavailable another
would quickly pick up the role, but it doesn't seem worth it.
The effect of this patch is limited to refreshing credentials only for
namespaces that exist in the local cluster. There is structure in place
in the CUE code to allow for namespaces bound to specific clusters, but
this is used only by the optional Vault component.
This patch was an attempt to work around
https://github.com/actions/actions-runner-controller/issues/3351 by
deploying the runner scale sets into unique namespaces.
This effort was a waste of time, only one listener pod successfully
registered for a given scale set name / group combination.
Because we have only one group named Default we can only have one
listener pod globally for a given scale set name.
Because we want our workflows to execute regardless of the availability
of a single cluster, we're going to let this fail for now. The pod
retries every 3 seconds. When a cluster is destroyed, another cluster
will quickly register.
A follow up patch will look to expand this retry behavior.
This patch migrates the vault component from [holos-infra][1] to a cue
based component. Vault is optional in the reference platform, so this
patch also defines an `#OptionalServices` struct to conditionally manage
a service across multiple clusters in the platform.
The primary use case for optional services is managing a namespace to
provision and provide secrets across clusters.
[1]: https://github.com/holos-run/holos-infra/tree/v0.5.0/components/core/core/vault
Pods are unnecessarily created when deploying helm based holos
components and often fail. Prevent these test pods by disabling helm
hooks with the `--no-hooks` flag.
Closes: #54
Problem:
The standby cluster on k2 fails to start. A pgbackrest pod first
restores the database from S3, then the pgha nodes try to replay the WAL
as part of the standby initialization process. This fails because the
PGDATA directory is not empty.
Solution:
Specify the spec.dataSource field only when the cluster is configured as
a primary cluster.
Result:
Non-primary clusters are standby, they skip the pgbackrest job to
restore from S3 and move straight to patroni replaying the WAL from S3
as part of the pgha pods.
One of the two pgha pods becomes the "standby leader" and restores the
WAL from S3. The other is a cascading standby and then restores the
same WAL from the standby leader.
After 8 minutes both pods are ready.
```
❯ k get pods
NAME READY STATUS RESTARTS AGE
zitadel-pgbouncer-d9f8cffc-j469g 2/2 Running 0 11m
zitadel-pgbouncer-d9f8cffc-xq29g 2/2 Running 0 11m
zitadel-pgha1-27w7-0 4/4 Running 0 11m
zitadel-pgha1-c5qj-0 4/4 Running 0 11m
zitadel-repo-host-0 2/2 Running 0 11m
```
Problem:
The k3 and k4 clusters are getting the Zitadel components which are
really only intended for the core cluster pair.
Solution:
Split the workload subtree into two, named foundation and accounts. The
core cluster pair gets foundation+accounts while the kX clusters get
just the foundation subtree.
Result:
prod-zitadel-iam is no longer managed on k3 and k4
Set the restore point to time="2024-03-11T17:08:58Z" level=info
msg="crunchy-pgbackrest ends" which is just after Gary and Nate
registered and were granted the cluster-admin role.
The [Streaming Standby][standby] architecture requires custom tls certs
for two clusters in two regions to connect to each other.
This patch manages the custom certs following the configuration
described in the article [Using Cert Manager to Deploy TLS for Postgres
on Kubernetes][article].
NOTE: One thing not mentioned anywhere in the crunchy documentation is
how custom tls certs work with pgbouncer. The pgbouncer service uses a
tls certificate issued by the pgo root cert, not by the custom
certificate authority.
For this reason, we use kustomize to patch the zitadel Deployment and
the zitadel-init and zitadel-setup Jobs. The patch projects the ca
bundle from the `zitadel-pgbouncer` secret into the zitadel pods at
/pgbouncer/ca.crt
[standby]: https://access.crunchydata.com/documentation/postgres-operator/latest/architecture/disaster-recovery#streaming-standby-with-an-external-repo
[article]: https://www.crunchydata.com/blog/using-cert-manager-to-deploy-tls-for-postgres-on-kubernetes
A full backup was taken using:
```
kubectl annotate postgrescluster zitadel postgres-operator.crunchydata.com/pgbackrest-backup="$(date)"
```
And completed with:
```
❯ k logs -f zitadel-backup-5r6v-v5jnm
time="2024-03-10T21:52:15Z" level=info msg="crunchy-pgbackrest starts"
time="2024-03-10T21:52:15Z" level=info msg="debug flag set to false"
time="2024-03-10T21:52:15Z" level=info msg="backrest backup command requested"
time="2024-03-10T21:52:15Z" level=info msg="command to execute is [pgbackrest backup --stanza=db --repo=2 --type=full]"
time="2024-03-10T21:55:18Z" level=info msg="crunchy-pgbackrest ends"
```
This patch verifies the point in time backup is robust in the face of
the following operations:
1. pg cluster zitadel was deleted (whole namespace emptied)
2. pg cluster zitadel was re-created _without_ a `dataSource`
3. pgo initailized a new database and backed up the blank database to
S3.
4. pg cluster zitadel was deleted again.
5. pg cluster zitadel was re-created with `dataSource` `options: ["--type=time", "--target=\"2024-03-10 21:56:00+00\""]` (Just after the full backup completed)
6. Restore completed successfully.
7. Applied the holos zitadel component.
8. Zitadel came up successfully and user login worked as expected.
- [x] Perform an in place [restore][restore] from [s3][bucket].
- [x] Set repo1-retention-full to clear warning
[restore]: https://access.crunchydata.com/documentation/postgres-operator/latest/tutorials/backups-disaster-recovery/disaster-recovery#restore-properties
[bucket]: https://access.crunchydata.com/documentation/postgres-operator/latest/tutorials/backups-disaster-recovery/disaster-recovery#cloud-based-data-source
To establish the canonical https://login.ois.run identity issuer on the
core cluster pair.
Custom resources for PGO have been imported with:
timoni mod vendor crds -f deploy/clusters/core2/components/prod-pgo-crds/prod-pgo-crds.gen.yaml
Note, the zitadel tls connection took some considerable effort to get
working. We intentionally use pgo issued certs to reduce the toil of
managing certs issued by cert manager.
The default tls configuration of pgo is pretty good with verify full
enabled.
The core2 cluster cannot provision pvcs because it's using the k8s-dev
pool when it has credentials valid only for the k8s-prod pool.
This patch adds an entry to the platform cluster map to configure the
pool for each cluster, with a default of k8s-dev.
PGO uses plain yaml and kustomize as the recommended installation
method. Holos supports upstream by adding a new PlainFiles component
kind, which simply copies files into place and lets kustomize handle the
generation of the api objects.
Cue is responsible for very little in this kind of component, basically
allowing overlay resources if needed and deferring everything else to
the holos cli.
The holos cli in turn is responsible for executing kubectl kustomize
build on the input directory to produce the rendered output, then writes
the rendered output into place.
Without this patch the arc controller fails to create a listener. The
template for the listener doesn't appear to be configurable from the
chart.
Could patch the listener pod template with kustomize, do this as a
follow up feature.
With this patch we get the expected two pods in the runner system
namespace:
```
❯ k get pods
NAME READY STATUS RESTARTS AGE
gha-rs-7db9c9f7-listener 1/1 Running 0 43s
gha-rs-controller-56bb9c77d9-6tjch 1/1 Running 0 8s
```
The resource names for the arc controller are too long:
❯ k get pods -n arc-systems
NAME READY STATUS RESTARTS AGE
gha-runner-scale-set-controller-gha-rs-controller-6bdf45bd6jx5n 1/1 Running 0 59m
Solve the problem by allowing components to set the release name to
`gha-rs-controller` which requires an additional field from the cue code
to differentiate from the chart name.
stderr 'Error: execution error at \(zitadel/templates/secret_zitadel-masterkey.yaml:2:4\): Either set .Values.zitadel.masterkey xor .Values.zitadel.masterkeySecretName'
// wait performs health checks for all reconciled resources. If set to true, .spec.healthChecks is ignored.
// Setting this to true for all components generates considerable load on the api server from watches.
// Operations are additionally more complicated when all resources are watched. Consider setting wait true for
// relatively simple components, otherwise target specific resources with spec.healthChecks.
wait:true|*false
dependsOn:[fork,vin_dependsOn{v},...]
}
}
// #Kustomize represents the kustomize post processor.
#Kustomize:kc.#Kustomization&{
_patches:{[_]:kc.#Patch}
apiVersion:"kustomize.config.k8s.io/v1beta1"
kind:"Kustomization"
// resources are file names holos will use to store intermediate component output for kustomize to post-process (i.e. helm template | kubectl kustomize)
// See the related resourcesFile field of the holos component.
resources:[h.#ResourcesFile]
iflen(_patches)>0{
patches:[forvin_patches{v}]
}
}
// So components don't need to import the package.
// NOTE: Beyond the base reference platform, services should typically be added to #OptionalServices instead of directly to a managed namespace.
// ManagedNamespace is a namespace to manage across all clusters in the holos platform.
#ManagedNamespace:{
namespace:{
metadata:{
name:string
labels:[string]:string
}
}
// clusterNames represents the set of clusters the namespace is managed on. Usually all clusters.
clusterNames:[...string]
forclusterinclusterNames{
clusters:(cluster):name:cluster
}
}
// #ManagedNamepsaces is the union of all namespaces across all cluster types and optional services.
// Holos adopts the namespace sameness position of SIG Multicluster, refer to https://github.com/kubernetes/community/blob/dd4c8b704ef1c9c3bfd928c6fa9234276d61ad18/sig-multicluster/namespace-sameness-position-statement.md
This sub-tree contains Holos Components to manage a [Choria Provisioner][1]
system for the use case of provisioning `holos controller` instances. These
instances are implementations of Machine Room which are in turn implementations
of Choria Server, hence why we use Choria Provisioner.
[1]: https://choria-io.github.io/provisioner/
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.