Problem:
When the ingress default Gateway AuthorizationPolicy/authpolicy-custom
rule is in place the choria machine room holos controller fails to
connect to the provisioner broker with the following error:
```
❯ holos controller run --config=agent.cfg
WARN[0000] Starting controller version 0.68.1 with config file /home/jeff/workspace/holos-run/holos/hack/choria/agent/agent.cfg leader=false
WARN[0000] Switching to provisioning configuration due to build defaults and missing /home/jeff/workspace/holos-run/holos/hack/choria/agent/agent.cfg
WARN[0000] Setting anonymous TLS mode during provisioning component=server connection=coffee.home identity=coffee.home
WARN[0000] Initial connection to the Broker failed on try 1: invalid websocket connection component=server connection=coffee.home identity=coffee.home
WARN[0000] Initial connection to the Broker failed on try 2: invalid websocket connection component=server connection=coffee.home identity=coffee.home
WARN[0002] Initial connection to the Broker failed on try 3: invalid websocket connection component=server connection=coffee.home identity=coffee.home
```
This problem is caused because the provisioning token url is set to
`wss://jeff.provision.dev.k2.holos.run:443` which has the port number
specified.
Solution:
Follow the upstream istio guidance of [Writing Host Match Policies][1]
to match host headers with or without the port specified.
Result:
The controller is able to connect to the provisioner broker:
[1]: https://istio.io/latest/docs/ops/best-practices/security/#writing-host-match-policies
This patch configures ArgoCD to log in via PKCE.
Note the changes are primarily in platform.site.cue and ensuring the
emailDomain is set properly. Note too the redirect URL needs to be
`/pkce/verify` when PKCE is enabled. Finally, if the setting is
reconfigured make sure to clear cookies otherwise the incorrect
`/auth/callback` path may be used.
Doing so forces unnecessary hosts for some projects. For example,
iam.ois.run is useless for the iam project, the primary project host is
login to build login.ois.run.
Some projects may not need any hosts as well.
Better to let the user specify `project: foo: hosts: foo: _` if they
want it.
This patch loops over every Gateway.spec.servers entry in the default
gateway and manages an ExternalSecret to sync the credential from the
provisioner cluster.
This patch provisions wildcard certs in the provisioning cluster. The
CN matches the project stage host global hostname without any cluster
qualifiers.
The use of a wildcard in place of the environment name dns segment at
the leftmost position of the fully qualified dns name enables additional
environments to be configured without reissuing certificates.
This is to avoid the 100 name per cert limit in LetsEncrypt.
Problem:
It's difficult to map and reduce the collection of project hosts when
configuring related Certificate, Gateway.spec.servers, VirtualService,
and auth proxy cookie domain settings.
Solution:
Define #ProjectHosts which takes a project and provides Hosts which is a
struct with a fqdn key and a #CertInfo value. The #CertInfo definition
is intended to provide everything need to reduce the Hosts property to
structs usful for the problematic resources mentioned previously.
Result:
Gateway.spec.servers are mapped using #ProjectHosts
Next step is to map the Certificate resources on the provisioner
cluster.
Problem:
Adding environments to a project causes certs to be re-issued.
Solution:
Enable wildcard certs for per-environment namespaces like jeff, gary,
nate, etc...
Result:
Environments can be added to a project stage without needing the cert to
be re-issued.
This patch avoids LetsEncrypt rate limits by consolidating multiple dns
names into one certificate.
For each project host, create a certificate for each stage in the
project. The certificate contains the dns names for all clusters and
environments associated with that stage and host.
This can become quite a list, the limit is 100 dnsNames.
For the Holos project which has 7 clusters and 4 dev environments, the
number of dns names is 32 (4 envs + 4 envs * 7 clusters = 32 dns names).
Still, a much needed improvement because we're limited to 50 certs per
week.
It may be worth considering wildcards for the per-developer
environments, which are the ones we'll likely spin up the most
frequently.
This patch is a partial step toward getting the choria broker up
and running in my own namespace. The choria broker is necessary for
provisioning machine room agents such as the holos controller.
This patch adds an initial holos controller subcommand. The machine
room agent starts, but doesn't yet provision because we haven't deployed
the provisioning infrastructure yet.
Configure NATS in a 3 Node deployment with resolver authentication using
an Operator JWT.
The operator secret nkeys are stored in the provisioner cluster. Get
them with:
holos get secret -n jeff-holos nats-nsc --print-key nsc.tgz | tar -tvzf-
The upstream nats charts don't specify namespaces for each attribute.
This works with helm update, but not helm template which holos uses to
render the yaml.
The missing namespace causes flux to fail.
This patch uses the flux kustomization to add the target namespace to
all resources.
When rendering a holos component which contains more than one helm chart, rendering fails. It should succeed.
```
holos render --cluster-name=k2 /home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/holos/... --log-level debug
```
```
9:03PM ERR could not execute version=0.64.2 err="could not rename: rename /home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/holos/nats/envs/vendor553679311 /home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/holos/nats/envs/vendor: file exists" loc=helm.go:145
```
This patch fixes the problem by moving each child item of the temporary
directory charts are installed into. This avoids the problem of moving
the parent when the parent target already exists.
Without this patch users encounter an error from istio because it does
not have a valid Jwks from ZITADEL to verify the request when processing
a `RequestAuthentication` policy.
Fixes error `AuthProxy JWKS Error - Jwks doesn't have key to match kid or alg from Jwt`.
Occurs when accessing a protected URL for the first time after tokens have expired.
Grafana does not yet have the istio sidecar. Prometheus is accessible
through the auth proxy. Cert manager is added to the workload clusters
so tls certs can be issued for webhooks, the kube-prom-stack helm chart
uses cert manager for this purpose.
With this patch Grafana is integrated with OIDC and I'm able to log in
as an Administrator.
Problem:
The VirtualService that catches auth routes for paths, e.g.
`/holos/authproxy/istio-ingress` is bound to the default gateway which
no longer exists because it has no hosts.
Solution:
It's unnecessary and complicated to create a Gateway for every project.
Instead, put all server entries into one `default` gateway and
consolidate the list using CUE.
Result:
It's easier to reason about this system. There is only one ingress
gateway, `default` and everything gets added to it. VirtualServices
need only bind to this gateway, which has a hosts entry appropriately
namespaced for the project.
Problem:
The ZITADEL database isn't restoring into the prod-iam namespace after
moving from prod-iam-zitadel because no backup exists at the bucket
path.
Solution:
Hard-code the path to the old namespace to restore the database. We'll
figure out how to move the backups to the new location in a follow up
change.
The `prod-platform-gateway` kustomization is reconciling early:
ExternalSecret/istio-ingress/argocd.ois.run dry-run failed: failed to
get API group resources: unable to retrieve the complete list of server
APIs: external-secrets.io/v1beta1: the server could not find the
requested resource
This patch moves ZITADEL from the prod-iam-zitadel namespace to the
projects managed prod-iam namespace, which is the prod environment of
the prod stage of the iam project.
Using the Helm chart so we can inject the istio sidecar with a kustomize
patch and tweak the configs for OIDC integration.
Login works, istio sidecar is injected. ArgoCD can only be configured
with one domain unfortunately, it's not accessible at argocd.ois.run,
only argocd.k2.ois.run (or whatever cluster it's installed into).
Ideally it would use the Host header but it does not.
RBAC is not implemented but the User Info endpoint does have group
membership so this shouldn't be a problem to implement.
This patch defines a #AuthPolicyRules struct which excludes hosts from
the blanket auth policy and includes them in specialized auth policies.
The purpose is to handle special cases like vault requests which have an
`X-Vault-Token` and `X-Vault-Request` header.
Vault does not use jwts so we cannot verify them in the mesh, have to
pass them along to the backend.
Closes: #93
The ingress gateway auth proxy callback conflicts with the project stage
auth proxy callback for the same backend Host: header value.
This patch disambiguates them by the namespace the auth proxy resides
in.
This patch adds a `RequestAuthentication` and `AuthorizationPolicy` rule
to protect all requests flowing through the default ingress gateway.
Consider a browser request for httpbin.k2.example.com representing any
arbitrary host with a valid destination inside the service mesh. The
default ingress gateway will check if there is already an
x-oidc-id-token header, and if so validate the token is issued by
ZITADEL and the aud value contains the ZITADEL project number.
If the header is not present, the request is forwarded to oauth2-proxy
in the istio-ingress namespace. This auth proxy is configured to start
the oidc auth flow with a redirect back to /holos/oidc/callback of the
Host: value originally provided in the browser request.
Closes: #82
This patch adds an ingress gateway extauthz provider. Because ZITADEL
returns all applications associated with a ZITADEL project in the aud
claim, it makes sense to have one ingress auth proxy at the initial
ingress gateway so we can get the ID token in the request header for
backend namespaces to match using `RequestAuthentication` and
`AuthorizationPolicy`.
This change likely makes the additional per-stage auth proxies
unnecessary and over-engineered. Backend namespaces will have access to
the ID token.