holos

mirror of https://github.com/holos-run/holos.git synced 2026-03-19 16:54:58 +00:00

Author	SHA1	Message	Date
Jeff McCune	bba3895f35	(#175 ) Add `holos generate component helm` command This patch adds a schematic to generate a holos component that wraps a helm chart. The cert-manager chart is the current example. Usage: ```bash set -euo pipefail rm -rf ~/holos/dev/bare mkdir ~/holos/dev/bare cd ~/holos/dev/bare holos generate platform bare holos pull platform config . holos render platform ./platform/ (cd components && holos generate component helm cert-manager) ``` The chart builds: ```bash holos build ./components/cert-manager \| yq . ``` And renders: ```bash holos render component ./components/cert-manager --cluster-name k2 find deploy -type f ``` ```txt 9:41PM INF render.go:83 rendered cert-manager version=0.81.1 cluster=k2 status=ok action=rendered name=cert-manager deploy/clusters/k2/holos/components/cert-manager-kustomization.gen.yaml deploy/clusters/k2/components/cert-manager/cert-manager.gen.yaml ```	2024-05-21 11:05:53 -07:00
Jeff McCune	24346b9a38	(#172 ) Deploy v0.79.0 to dev	2024-05-17 10:15:05 -07:00
Jeff McCune	d3c2d55706	(#172 ) Deploy v0.76.0 to dev	2024-05-14 13:28:19 -07:00
Jeff McCune	9a2773c618	(#171 ) Refactor API to use FieldMasks This patch refactors the API following the [API Best Practices][api] documentation. The UpdatePlatform method is modeled after a mutating operation described [by Netflix][nflx] instead of using a REST resource representation. This makes it much easier to iterate over the fields that need to be updated as the PlatformUpdateOperation is a flat data structure while a Platform resource may have nested fields. Nested fields are more complicated and less clear to handle with a FieldMask. This patch also adds a snapckbar message on save. Previously, the save button didn't give any indication of success or failure. This patch fixes the problem by adding a snackbar message that pop up at the bottom of the screen nicely. When the snackbar message is dismissed or times out the save button is re-enabled. [api]: https://protobuf.dev/programming-guides/api/ [nflx]: https://netflixtechblog.com/practical-api-design-at-netflix-part-2-protobuf-fieldmask-for-mutation-operations-2e75e1d230e4 Examples: FieldMask for ListPlatforms ``` grpcurl -H "x-oidc-id-token: $(holos token)" -d @ ${HOLOS_SERVER##*/} holos.platform.v1alpha1.PlatformService.ListPlatforms <<EOF { "org_id": "018f36fb-e3f7-7f7f-a1c5-c85fb735d215", "field_mask": { "paths": ["id","name"] } } EOF ``` ```json { "platforms": [ { "id": "018f36fb-e3ff-7f7f-a5d1-7ca2bf499e94", "name": "bare" }, { "id": "018f6b06-9e57-7223-91a9-784e145d998c", "name": "gary" }, { "id": "018f6b06-9e53-7223-8ae1-1ad53d46b158", "name": "jeff" }, { "id": "018f6b06-9e5b-7223-8b8b-ea62618e8200", "name": "nate" } ] } ``` Closes: #171	2024-05-13 16:20:20 -07:00
Jeff McCune	19df2ec0fb	(#167 ) Bump dev deployment to 0.74.0	2024-05-07 16:58:03 -07:00
Jeff McCune	42f916af41	(#164 ) Use quay.io/holos/oauth2-proxy:v7.6.0-1-g77a03ae2 Custom build to set samesite=none on the csrf cookie.	2024-05-06 16:18:32 -07:00
Jeff McCune	4e8fa5abda	(#165 ) Bump dev deployment to 0.73.1	2024-05-06 11:22:24 -07:00
Jeff McCune	6894f45b6c	(#165 ) Deploy Holos to Dev This patch deploys holos to the dev environment on the k2 cluster. It's accessible at https://app.dev.k2.holos.run/ behind the auth proxy by default.	2024-05-06 11:10:29 -07:00
Jeff McCune	62735b99e7	(#126 ) Update Tiltfile to use holos.run for dev This patch updates the Tiltfile to use the holos.run domain which is integrated with the default Gateway.	2024-04-22 13:42:18 -07:00
Jeff McCune	29ab9c6300	(#141 ) Install provisioner helper.rb from entrypoint And add a script to reset the choria provisioner credentials and config.	2024-04-22 13:20:38 -07:00
Jeff McCune	c07f35ecd6	(#141 ) Fix holos controller invalid websocket connection error Problem: When the ingress default Gateway AuthorizationPolicy/authpolicy-custom rule is in place the choria machine room holos controller fails to connect to the provisioner broker with the following error: ``` ❯ holos controller run --config=agent.cfg WARN[0000] Starting controller version 0.68.1 with config file /home/jeff/workspace/holos-run/holos/hack/choria/agent/agent.cfg leader=false WARN[0000] Switching to provisioning configuration due to build defaults and missing /home/jeff/workspace/holos-run/holos/hack/choria/agent/agent.cfg WARN[0000] Setting anonymous TLS mode during provisioning component=server connection=coffee.home identity=coffee.home WARN[0000] Initial connection to the Broker failed on try 1: invalid websocket connection component=server connection=coffee.home identity=coffee.home WARN[0000] Initial connection to the Broker failed on try 2: invalid websocket connection component=server connection=coffee.home identity=coffee.home WARN[0002] Initial connection to the Broker failed on try 3: invalid websocket connection component=server connection=coffee.home identity=coffee.home ``` This problem is caused because the provisioning token url is set to `wss://jeff.provision.dev.k2.holos.run:443` which has the port number specified. Solution: Follow the upstream istio guidance of [Writing Host Match Policies][1] to match host headers with or without the port specified. Result: The controller is able to connect to the provisioner broker: [1]: https://istio.io/latest/docs/ops/best-practices/security/#writing-host-match-policies	2024-04-22 12:31:10 -07:00
Jeff McCune	74a181db21	(#133 ) Add missing Choria Provisioner deployment	2024-04-19 15:14:31 -07:00
Jeff McCune	ba10113342	(#133 ) Fix tls error when connecting to provisioner websocket This problem fixes an error where the istio ingress gateway proxy failed to verify the TLS certificate presented by the choria broker upstream server. kubectl logs choria-broker-0 level=error msg="websocket: TLS handshake error from 10.244.1.190:36142: remote error: tls: unknown certificate\n" Istio ingress logs: kubectl -n istio-ingress logs -l app=istio-ingressgateway -f \| grep --line-buffered '^{' \| jq . "upstream_transport_failure_reason": "TLS_error:\|268435581:SSL_routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED:TLS_error_end:TLS_error_end" Client curl output: curl https://jeff.provision.dev.k2.holos.run upstream connect error or disconnect/reset before headers. retried and the latest reset reason: remote connection failure, transport failure reason: TLS_error:\|268435581:SSL routines:OPENSSL_i nternal:CERTIFICATE_VERIFY_FAILED:TLS_error_end:TLS_error_end Explanation of error: Istio defaults to expecting a tls certificate matching the downstream host/authority which isn't how we've configured Choria. Refer to [ClientTLSSettings][1] > A list of alternate names to verify the subject identity in the > certificate. If specified, the proxy will verify that the server > certificate’s subject alt name matches one of the specified values. If > specified, this list overrides the value of subject_alt_names from the > ServiceEntry. If unspecified, automatic validation of upstream presented > certificate for new upstream connections will be done based on the > downstream HTTP host/authority header, provided > VERIFY_CERTIFICATE_AT_CLIENT and ENABLE_AUTO_SNI environmental variables > are set to true. [1]: https://istio.io/latest/docs/reference/config/networking/destination-rule/#ClientTLSSettings	2024-04-19 13:13:09 -07:00
Jeff McCune	eb0207c92e	(#133 ) Choria Provisioner This patch is a work in progress to configure the provisioner to connect to the broker. Services and deployments are prefixed with choria for clarity.	2024-04-19 13:13:08 -07:00
Jeff McCune	0fbcee8119	(#133 ) Extend the life of the Platform Issuer CA The platform issuer root CA was set to expire after 90 days, the default value. This is too short. Extend the life of the root CA beyond 100 years.	2024-04-19 11:29:17 -07:00
Jeff McCune	ce8bc798f6	(#133 ) Exclude nats and provision hosts from the auth proxy Problem: The identity aware auth proxy attached to the default gateway is blocking access to NATS and the Choria Provisioner cluster. Solution: Add configuration that causes the project hosts to get added to the exclusion list of the AuthorizationPolicy/authproxy-custom rule. Result: Requests bypass the auth proxy and go straight to the backend. The rules look like: kubectl get authorizationpolicy authproxy-custom -o yaml ```yaml apiVersion: security.istio.io/v1 kind: AuthorizationPolicy metadata: name: authproxy-custom namespace: istio-ingress labels: app.kubernetes.io/name: authproxy-custom app.kubernetes.io/part-of: istio-ingressgateway spec: action: CUSTOM provider: name: ingressauth rules: - to: - operation: notHosts: - login.ois.run - vault.core.ois.run - provision.holos.run - nats.holos.run - provision.dev.holos.run - nats.dev.holos.run - jeff.provision.dev.holos.run - jeff.nats.dev.holos.run - gary.provision.dev.holos.run - gary.nats.dev.holos.run - nate.provision.dev.holos.run - nate.nats.dev.holos.run - provision.k2.holos.run - nats.k2.holos.run - provision.dev.k2.holos.run - nats.dev.k2.holos.run - jeff.provision.dev.k2.holos.run - jeff.nats.dev.k2.holos.run - gary.provision.dev.k2.holos.run - gary.nats.dev.k2.holos.run - nate.provision.dev.k2.holos.run - nate.nats.dev.k2.holos.run when: - key: request.headers[x-oidc-id-token] notValues: - '*' selector: matchLabels: istio: ingressgateway ```	2024-04-19 06:32:11 -07:00
Jeff McCune	996195d651	(#137 ) Fix ArgoCD PKCE login small comment	2024-04-18 14:39:13 -07:00
Jeff McCune	f00b29d3a3	(#137 ) Fix ArgoCD PKCE login This patch configures ArgoCD to log in via PKCE. Note the changes are primarily in platform.site.cue and ensuring the emailDomain is set properly. Note too the redirect URL needs to be `/pkce/verify` when PKCE is enabled. Finally, if the setting is reconfigured make sure to clear cookies otherwise the incorrect `/auth/callback` path may be used.	2024-04-18 14:37:06 -07:00
Jeff McCune	1642787825	(#101 ) Fix port names in servers must be unique: duplicate name https Problem: Port names in the default Gateway.spec.servers.port field must be unique across all servers associated with the workload. Solution: Append the fully qualified domain name with dots replaced with hyphens. Result: Port name is unique.	2024-04-18 11:21:05 -07:00
Jeff McCune	f83781480f	(#101 ) Do not add Gateway.spec.servers for other clusters Problem: The default gateway in one cluster gets server entries for all hosts in the problem. This makes the list unnecessarily large with entries for clusters that should not be handled on the current cluster. For example, the k2 cluster has gateway entries to route hosts for k1, k3, k4, k5, etc... Solution: Add a field to the CertInfo definition representing which clusters the host is valid on. Result: Hosts which are valid on all clusters, e.g. login.ois.run, have all project clusters added to the clusters field of the CertInfo. Hosts which are valid on a single cluster have the coresponding single entry added. When building resources, holos components should check if `#ClusterName` is a valid field of the CertInfo.clusters field. If so, the host is valid for the current cluster. If not, the host should be omitted from the current cluster.	2024-04-18 11:10:16 -07:00
Jeff McCune	9b70205855	(#101 ) Use letsencrypt production instead of staging Certificates issue OK from staging, switching to production.	2024-04-18 10:36:28 -07:00
Jeff McCune	0e4bf3c144	(#101 ) Manage certs on all clusters We're going to do a big re-issue so might as well do it once so we don't have to re-issue again to add more clusters to existing projects.	2024-04-18 10:16:55 -07:00
Jeff McCune	1241c74b41	(#101 ) Do not add the project name as a project host Doing so forces unnecessary hosts for some projects. For example, iam.ois.run is useless for the iam project, the primary project host is login to build login.ois.run. Some projects may not need any hosts as well. Better to let the user specify `project: foo: hosts: foo: _` if they want it.	2024-04-18 09:59:21 -07:00
Jeff McCune	44fea098de	(#101 ) Manage an ExternalSecret for every Server in the default Gateway This patch loops over every Gateway.spec.servers entry in the default gateway and manages an ExternalSecret to sync the credential from the provisioner cluster.	2024-04-18 09:53:39 -07:00
Jeff McCune	52286efa25	(#101 ) Fix duplicate certs in holos components Problem: A Holos Component is created for each project stage, but all hosts for all stages in the project are added. This creates duplicates. Solution: Sort project hosts by their stage and map the holos component for a stage to the hosts for that stage. Result: Duplicates are eliminated, the prod certs are not in the dev holos component and vice-versa.	2024-04-18 09:17:49 -07:00
Jeff McCune	a1b2179442	(#101 ) Remove holos-saas-certs holos component No longer needed now that project host certs are using wildcards and organized nicely.	2024-04-18 06:32:06 -07:00
Jeff McCune	cffc430738	(#101 ) Provision wildcard certs for all Gateway servers This patch provisions wildcard certs in the provisioning cluster. The CN matches the project stage host global hostname without any cluster qualifiers. The use of a wildcard in place of the environment name dns segment at the leftmost position of the fully qualified dns name enables additional environments to be configured without reissuing certificates. This is to avoid the 100 name per cert limit in LetsEncrypt.	2024-04-18 06:26:29 -07:00
Jeff McCune	d76454272b	(#101 ) Simplify the GatewayServers struct Mapping each project host fqdn to the stage is unnecessary. The list of gateway servers is constructed from each FQDN in the project. This patch removes the unnecessary struct mappings.	2024-04-18 05:32:19 -07:00
Jeff McCune	9d1e77c00f	(#101 ) Define #ProjectHosts to manage project hosts Problem: It's difficult to map and reduce the collection of project hosts when configuring related Certificate, Gateway.spec.servers, VirtualService, and auth proxy cookie domain settings. Solution: Define #ProjectHosts which takes a project and provides Hosts which is a struct with a fqdn key and a #CertInfo value. The #CertInfo definition is intended to provide everything need to reduce the Hosts property to structs usful for the problematic resources mentioned previously. Result: Gateway.spec.servers are mapped using #ProjectHosts Next step is to map the Certificate resources on the provisioner cluster.	2024-04-17 21:59:04 -07:00
Jeff McCune	2050abdc6c	(#101 ) Add wildcard support to project certs Problem: Adding environments to a project causes certs to be re-issued. Solution: Enable wildcard certs for per-environment namespaces like jeff, gary, nate, etc... Result: Environments can be added to a project stage without needing the cert to be re-issued.	2024-04-17 12:32:44 -07:00
Jeff McCune	3ea013c503	(#101 ) Consolidate certificates by project stage This patch avoids LetsEncrypt rate limits by consolidating multiple dns names into one certificate. For each project host, create a certificate for each stage in the project. The certificate contains the dns names for all clusters and environments associated with that stage and host. This can become quite a list, the limit is 100 dnsNames. For the Holos project which has 7 clusters and 4 dev environments, the number of dns names is 32 (4 envs + 4 envs * 7 clusters = 32 dns names). Still, a much needed improvement because we're limited to 50 certs per week. It may be worth considering wildcards for the per-developer environments, which are the ones we'll likely spin up the most frequently.	2024-04-17 11:58:46 -07:00
Jeff McCune	309db96138	(#133 ) Choria Broker for Holos Controller provisioning This patch is a partial step toward getting the choria broker up and running in my own namespace. The choria broker is necessary for provisioning machine room agents such as the holos controller.	2024-04-17 08:48:31 -07:00
Jeff McCune	ab9bca0750	(#132 ) Controller Subcommand This patch adds an initial holos controller subcommand. The machine room agent starts, but doesn't yet provision because we haven't deployed the provisioning infrastructure yet.	2024-04-16 15:40:25 -07:00
Jeff McCune	ac2be67c3c	(#130 ) NATS deployment with operator jwt Configure NATS in a 3 Node deployment with resolver authentication using an Operator JWT. The operator secret nkeys are stored in the provisioner cluster. Get them with: holos get secret -n jeff-holos nats-nsc --print-key nsc.tgz \| tar -tvzf-	2024-04-15 17:02:18 -07:00
Jeff McCune	2954a57872	(#120 ) Fix NATS target namespace The upstream nats charts don't specify namespaces for each attribute. This works with helm update, but not helm template which holos uses to render the yaml. The missing namespace causes flux to fail. This patch uses the flux kustomization to add the target namespace to all resources.	2024-04-10 21:54:58 -07:00
Jeff McCune	df705bd79f	(#121 ) Fix Multiple Charts cause holos render to fail When rendering a holos component which contains more than one helm chart, rendering fails. It should succeed. ``` holos render --cluster-name=k2 /home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/holos/... --log-level debug ``` ``` 9:03PM ERR could not execute version=0.64.2 err="could not rename: rename /home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/holos/nats/envs/vendor553679311 /home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/holos/nats/envs/vendor: file exists" loc=helm.go:145 ``` This patch fixes the problem by moving each child item of the temporary directory charts are installed into. This avoids the problem of moving the parent when the parent target already exists.	2024-04-10 21:27:39 -07:00
Jeff McCune	4e8ce3585d	(#115 ) Minor clean up of cue code	2024-04-10 21:21:16 -07:00
Jeff McCune	bf5765c9cb	(#110 ) Update ZITADEL to v2.49.1 from v2.46.0 Attempt to resolve issue where `/oauth/v2/keys` returns `{"keys": []}` causing id token verification failures. Closes: #110	2024-04-07 17:20:10 -07:00
Jeff McCune	04158485c7	(#96 ) Do not expire ZITADEL signing public key The public key needs to be configured along with the signing key.	2024-04-05 10:52:36 -07:00
Jeff McCune	cf83c77280	(#96 ) Do not expire ZITADEL signing private key Without this patch users encounter an error from istio because it does not have a valid Jwks from ZITADEL to verify the request when processing a `RequestAuthentication` policy. Fixes error `AuthProxy JWKS Error - Jwks doesn't have key to match kid or alg from Jwt`. Occurs when accessing a protected URL for the first time after tokens have expired.	2024-04-04 15:56:00 -07:00
Jeff McCune	6e545b13dd	(#104 ) Deploy crunchy monitoring stack for ZITADEL Not exposed via the ingress gateway, but accessible via kubectl port-forward svc/crunchy-grafana 3000 Refer to [day two monitoring][1]. This is pretty much a straight copy of the upstream kustomize configs at [2]. [1]: https://access.crunchydata.com/documentation/postgres-operator/5.5/tutorials/day-two/monitoring [2]: https://github.com/CrunchyData/postgres-operator-examples/tree/main/kustomize/monitoring	2024-04-04 15:40:07 -07:00
Jeff McCune	bf258a1f41	(#104 ) Enable monitoring for ZITADEL postgres This patch enables the monitoring configuration for the ZITADEL postgres cluster. Refer to: https://access.crunchydata.com/documentation/postgres-operator/5.5/tutorials/day-two/monitoring Integrating with: https://github.com/CrunchyData/postgres-operator-examples/tree/main/kustomize/monitoring which will become a separate holos component instance.	2024-04-03 22:26:38 -07:00
Jeff McCune	6f06c73d6f	(#85 ) Initial addition of kube-prometheus-stack Grafana does not yet have the istio sidecar. Prometheus is accessible through the auth proxy. Cert manager is added to the workload clusters so tls certs can be issued for webhooks, the kube-prom-stack helm chart uses cert manager for this purpose. With this patch Grafana is integrated with OIDC and I'm able to log in as an Administrator.	2024-04-03 21:29:26 -07:00
Jeff McCune	bcb02b5c5c	(#47 ) Remove the prod-iam-zitadel namespace No longer needed, cluster has moved to prod-iam namespace.	2024-04-03 15:10:30 -07:00
Jeff McCune	0736c7de1a	(#47 ) Bind ALL VirtualServices to the default gateway Problem: The VirtualService that catches auth routes for paths, e.g. `/holos/authproxy/istio-ingress` is bound to the default gateway which no longer exists because it has no hosts. Solution: It's unnecessary and complicated to create a Gateway for every project. Instead, put all server entries into one `default` gateway and consolidate the list using CUE. Result: It's easier to reason about this system. There is only one ingress gateway, `default` and everything gets added to it. VirtualServices need only bind to this gateway, which has a hosts entry appropriately namespaced for the project.	2024-04-03 14:56:40 -07:00
Jeff McCune	28be9f9fbb	(#47 ) Use the project specific Gateway The login service is unavailable because the wrong gateway is used. When using projects the VS needs to attach to the correct Gateway.	2024-04-03 12:59:48 -07:00
Jeff McCune	647681de38	(#99 ) Restore backups from prod-iam namespace This patch configures the standby cluster to restore backups from the prod-iam namespace instead of the prod-iam-zitadel namespace.	2024-04-03 12:30:12 -07:00
Jeff McCune	81beb5c539	(#47 ) Restore ZITADEL from existing backups Problem: The ZITADEL database isn't restoring into the prod-iam namespace after moving from prod-iam-zitadel because no backup exists at the bucket path. Solution: Hard-code the path to the old namespace to restore the database. We'll figure out how to move the backups to the new location in a follow up change.	2024-04-03 11:44:16 -07:00
Jeff McCune	5c1e0a29c8	(#47 ) Have Ceph depend on secret stores Another kustomization reconciling too early.	2024-04-03 11:22:15 -07:00
Jeff McCune	01ac5276a9	(#47 ) Have Gateway depend on secret stores The `prod-platform-gateway` kustomization is reconciling early: ExternalSecret/istio-ingress/argocd.ois.run dry-run failed: failed to get API group resources: unable to retrieve the complete list of server APIs: external-secrets.io/v1beta1: the server could not find the requested resource	2024-04-03 11:20:15 -07:00

1 2 3 4

189 Commits