Commit Graph

164 Commits

Author SHA1 Message Date
Jeff McCune
ce8bc798f6 (#133) Exclude nats and provision hosts from the auth proxy
Problem:
The identity aware auth proxy attached to the default gateway is
blocking access to NATS and the Choria Provisioner cluster.

Solution:
Add configuration that causes the project hosts to get added to the
exclusion list of the AuthorizationPolicy/authproxy-custom rule.

Result:
Requests bypass the auth proxy and go straight to the backend.  The
rules look like:

    kubectl get authorizationpolicy authproxy-custom -o yaml

```yaml
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: authproxy-custom
  namespace: istio-ingress
  labels:
    app.kubernetes.io/name: authproxy-custom
    app.kubernetes.io/part-of: istio-ingressgateway
spec:
  action: CUSTOM
  provider:
    name: ingressauth
  rules:
  - to:
    - operation:
        notHosts:
        - login.ois.run
        - vault.core.ois.run
        - provision.holos.run
        - nats.holos.run
        - provision.dev.holos.run
        - nats.dev.holos.run
        - jeff.provision.dev.holos.run
        - jeff.nats.dev.holos.run
        - gary.provision.dev.holos.run
        - gary.nats.dev.holos.run
        - nate.provision.dev.holos.run
        - nate.nats.dev.holos.run
        - provision.k2.holos.run
        - nats.k2.holos.run
        - provision.dev.k2.holos.run
        - nats.dev.k2.holos.run
        - jeff.provision.dev.k2.holos.run
        - jeff.nats.dev.k2.holos.run
        - gary.provision.dev.k2.holos.run
        - gary.nats.dev.k2.holos.run
        - nate.provision.dev.k2.holos.run
        - nate.nats.dev.k2.holos.run
    when:
    - key: request.headers[x-oidc-id-token]
      notValues:
      - '*'
  selector:
    matchLabels:
      istio: ingressgateway
```
2024-04-19 06:32:11 -07:00
Jeff McCune
996195d651 (#137) Fix ArgoCD PKCE login small comment 2024-04-18 14:39:13 -07:00
Jeff McCune
f00b29d3a3 (#137) Fix ArgoCD PKCE login
This patch configures ArgoCD to log in via PKCE.

Note the changes are primarily in platform.site.cue and ensuring the
emailDomain is set properly.  Note too the redirect URL needs to be
`/pkce/verify` when PKCE is enabled.  Finally, if the setting is
reconfigured make sure to clear cookies otherwise the incorrect
`/auth/callback` path may be used.
2024-04-18 14:37:06 -07:00
Jeff McCune
0e4bf3c144 (#101) Manage certs on all clusters
We're going to do a big re-issue so might as well do it once so we don't
have to re-issue again to add more clusters to existing projects.
2024-04-18 10:16:55 -07:00
Jeff McCune
1241c74b41 (#101) Do not add the project name as a project host
Doing so forces unnecessary hosts for some projects.  For example,
iam.ois.run is useless for the iam project, the primary project host is
login to build login.ois.run.

Some projects may not need any hosts as well.

Better to let the user specify `project: foo: hosts: foo: _` if they
want it.
2024-04-18 09:59:21 -07:00
Jeff McCune
44fea098de (#101) Manage an ExternalSecret for every Server in the default Gateway
This patch loops over every Gateway.spec.servers entry in the default
gateway and manages an ExternalSecret to sync the credential from the
provisioner cluster.
2024-04-18 09:53:39 -07:00
Jeff McCune
a1b2179442 (#101) Remove holos-saas-certs holos component
No longer needed now that project host certs are using wildcards and
organized nicely.
2024-04-18 06:32:06 -07:00
Jeff McCune
cffc430738 (#101) Provision wildcard certs for all Gateway servers
This patch provisions wildcard certs in the provisioning cluster.  The
CN matches the project stage host global hostname without any cluster
qualifiers.

The use of a wildcard in place of the environment name dns segment at
the leftmost position of the fully qualified dns name enables additional
environments to be configured without reissuing certificates.

This is to avoid the 100 name per cert limit in LetsEncrypt.
2024-04-18 06:26:29 -07:00
Jeff McCune
9d1e77c00f (#101) Define #ProjectHosts to manage project hosts
Problem:
It's difficult to map and reduce the collection of project hosts when
configuring related Certificate, Gateway.spec.servers, VirtualService,
and auth proxy cookie domain settings.

Solution:
Define #ProjectHosts which takes a project and provides Hosts which is a
struct with a fqdn key and a #CertInfo value.  The #CertInfo definition
is intended to provide everything need to reduce the Hosts property to
structs usful for the problematic resources mentioned previously.

Result:
Gateway.spec.servers are mapped using #ProjectHosts

Next step is to map the Certificate resources on the provisioner
cluster.
2024-04-17 21:59:04 -07:00
Jeff McCune
2050abdc6c (#101) Add wildcard support to project certs
Problem:
Adding environments to a project causes certs to be re-issued.

Solution:
Enable wildcard certs for per-environment namespaces like jeff, gary,
nate, etc...

Result:
Environments can be added to a project stage without needing the cert to
be re-issued.
2024-04-17 12:32:44 -07:00
Jeff McCune
3ea013c503 (#101) Consolidate certificates by project stage
This patch avoids LetsEncrypt rate limits by consolidating multiple dns
names into one certificate.

For each project host, create a certificate for each stage in the
project.  The certificate contains the dns names for all clusters and
environments associated with that stage and host.

This can become quite a list, the limit is 100 dnsNames.

For the Holos project which has 7 clusters and 4 dev environments, the
number of dns names is 32 (4 envs + 4 envs * 7 clusters = 32 dns names).

Still, a much needed improvement because we're limited to 50 certs per
week.

It may be worth considering wildcards for the per-developer
environments, which are the ones we'll likely spin up the most
frequently.
2024-04-17 11:58:46 -07:00
Jeff McCune
309db96138 (#133) Choria Broker for Holos Controller provisioning
This patch is a partial step toward getting the choria broker up
and running in my own namespace.  The choria broker is necessary for
provisioning machine room agents such as the holos controller.
2024-04-17 08:48:31 -07:00
Jeff McCune
ab9bca0750 (#132) Controller Subcommand
This patch adds an initial holos controller subcommand.  The machine
room agent starts, but doesn't yet provision because we haven't deployed
the provisioning infrastructure yet.
2024-04-16 15:40:25 -07:00
Jeff McCune
ac2be67c3c (#130) NATS deployment with operator jwt
Configure NATS in a 3 Node deployment with resolver authentication using
an Operator JWT.

The operator secret nkeys are stored in the provisioner cluster.  Get
them with:

    holos get secret -n jeff-holos nats-nsc --print-key nsc.tgz | tar -tvzf-
2024-04-15 17:02:18 -07:00
Jeff McCune
2954a57872 (#120) Fix NATS target namespace
The upstream nats charts don't specify namespaces for each attribute.
This works with helm update, but not helm template which holos uses to
render the yaml.

The missing namespace causes flux to fail.

This patch uses the flux kustomization to add the target namespace to
all resources.
2024-04-10 21:54:58 -07:00
Jeff McCune
df705bd79f (#121) Fix Multiple Charts cause holos render to fail
When rendering a holos component which contains more than one helm chart, rendering fails.  It should succeed.

```
holos render --cluster-name=k2 /home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/holos/... --log-level debug
```

```
9:03PM ERR could not execute version=0.64.2 err="could not rename: rename /home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/holos/nats/envs/vendor553679311 /home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/holos/nats/envs/vendor: file exists" loc=helm.go:145
```

This patch fixes the problem by moving each child item of the temporary
directory charts are installed into.  This avoids the problem of moving
the parent when the parent target already exists.
2024-04-10 21:27:39 -07:00
Jeff McCune
bf5765c9cb (#110) Update ZITADEL to v2.49.1 from v2.46.0
Attempt to resolve issue where `/oauth/v2/keys` returns `{"keys": []}`
causing id token verification failures.

Closes: #110
2024-04-07 17:20:10 -07:00
Jeff McCune
04158485c7 (#96) Do not expire ZITADEL signing public key
The public key needs to be configured along with the signing key.
2024-04-05 10:52:36 -07:00
Jeff McCune
cf83c77280 (#96) Do not expire ZITADEL signing private key
Without this patch users encounter an error from istio because it does
not have a valid Jwks from ZITADEL to verify the request when processing
a `RequestAuthentication` policy.

Fixes error `AuthProxy JWKS Error - Jwks doesn't have key to match kid or alg from Jwt`.

Occurs when accessing a protected URL for the first time after tokens have expired.
2024-04-04 15:56:00 -07:00
Jeff McCune
6e545b13dd (#104) Deploy crunchy monitoring stack for ZITADEL
Not exposed via the ingress gateway, but accessible via

    kubectl port-forward svc/crunchy-grafana 3000

Refer to [day two monitoring][1].  This is pretty much a straight copy
of the upstream kustomize configs at [2].

[1]: https://access.crunchydata.com/documentation/postgres-operator/5.5/tutorials/day-two/monitoring
[2]: https://github.com/CrunchyData/postgres-operator-examples/tree/main/kustomize/monitoring
2024-04-04 15:40:07 -07:00
Jeff McCune
bf258a1f41 (#104) Enable monitoring for ZITADEL postgres
This patch enables the monitoring configuration for the ZITADEL postgres
cluster.

Refer to: https://access.crunchydata.com/documentation/postgres-operator/5.5/tutorials/day-two/monitoring

Integrating with:
https://github.com/CrunchyData/postgres-operator-examples/tree/main/kustomize/monitoring
which will become a separate holos component instance.
2024-04-03 22:26:38 -07:00
Jeff McCune
6f06c73d6f (#85) Initial addition of kube-prometheus-stack
Grafana does not yet have the istio sidecar.  Prometheus is accessible
through the auth proxy.  Cert manager is added to the workload clusters
so tls certs can be issued for webhooks, the kube-prom-stack helm chart
uses cert manager for this purpose.

With this patch Grafana is integrated with OIDC and I'm able to log in
as an Administrator.
2024-04-03 21:29:26 -07:00
Jeff McCune
bcb02b5c5c (#47) Remove the prod-iam-zitadel namespace
No longer needed, cluster has moved to prod-iam namespace.
2024-04-03 15:10:30 -07:00
Jeff McCune
0736c7de1a (#47) Bind ALL VirtualServices to the default gateway
Problem:
The VirtualService that catches auth routes for paths, e.g.
`/holos/authproxy/istio-ingress` is bound to the default gateway which
no longer exists because it has no hosts.

Solution:
It's unnecessary and complicated to create a Gateway for every project.
Instead, put all server entries into one `default` gateway and
consolidate the list using CUE.

Result:
It's easier to reason about this system.  There is only one ingress
gateway, `default` and everything gets added to it.  VirtualServices
need only bind to this gateway, which has a hosts entry appropriately
namespaced for the project.
2024-04-03 14:56:40 -07:00
Jeff McCune
28be9f9fbb (#47) Use the project specific Gateway
The login service is unavailable because the wrong gateway is used.
When using projects the VS needs to attach to the correct Gateway.
2024-04-03 12:59:48 -07:00
Jeff McCune
647681de38 (#99) Restore backups from prod-iam namespace
This patch configures the standby cluster to restore backups from the
prod-iam namespace instead of the prod-iam-zitadel namespace.
2024-04-03 12:30:12 -07:00
Jeff McCune
81beb5c539 (#47) Restore ZITADEL from existing backups
Problem:
The ZITADEL database isn't restoring into the prod-iam namespace after
moving from prod-iam-zitadel because no backup exists at the bucket
path.

Solution:
Hard-code the path to the old namespace to restore the database.  We'll
figure out how to move the backups to the new location in a follow up
change.
2024-04-03 11:44:16 -07:00
Jeff McCune
5c1e0a29c8 (#47) Have Ceph depend on secret stores
Another kustomization reconciling too early.
2024-04-03 11:22:15 -07:00
Jeff McCune
01ac5276a9 (#47) Have Gateway depend on secret stores
The `prod-platform-gateway` kustomization is reconciling early:

ExternalSecret/istio-ingress/argocd.ois.run dry-run failed: failed to
get API group resources: unable to retrieve the complete list of server
APIs: external-secrets.io/v1beta1: the server could not find the
requested resource
2024-04-03 11:20:15 -07:00
Jeff McCune
e40594ad8e (#47) Move ZITADEL to prod-iam project namespace
This patch moves ZITADEL from the prod-iam-zitadel namespace to the
projects managed prod-iam namespace, which is the prod environment of
the prod stage of the iam project.
2024-04-03 11:06:55 -07:00
Jeff McCune
bc9c6a622a (#97) Increase ZITADEL pgdata volume to 20Gi
Problem:

```
❯ k exec zitadel-pgha1-4npq-0 -it -- bash
Defaulted container "database" out of: database, replication-cert-copy, pgbackrest, pgbackrest-config, postgres-startup (init), nss-wrapper-init (init)
bash-4.4$ df -h
Filesystem      Size  Used Avail Use% Mounted on
overlay         119G   51G   68G  43% /
tmpfs            64M     0   64M   0% /dev
/dev/rbd3       9.8G  9.8G     0 100% /pgdata
/dev/sda6       119G   51G   68G  43% /tmp
tmpfs            16G   24K   16G   1% /pgconf/tls
tmpfs            16G   24K   16G   1% /etc/database-containerinfo
tmpfs            16G   16K   16G   1% /etc/patroni
tmpfs            16G     0   16G   0% /dev/shm
tmpfs            16G   28K   16G   1% /etc/pgbackrest/conf.d
tmpfs            16G   12K   16G   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs           7.9G     0  7.9G   0% /proc/acpi
tmpfs           7.9G     0  7.9G   0% /proc/scsi
tmpfs           7.9G     0  7.9G   0% /sys/firmware
```
2024-04-03 10:09:49 -07:00
Jeff McCune
17f22199b7 (#86) ArgoCD - Disable Dex
Not needed
2024-04-02 15:47:22 -07:00
Jeff McCune
7e93fe4535 (#86) ArgoCD
Using the Helm chart so we can inject the istio sidecar with a kustomize
patch and tweak the configs for OIDC integration.

Login works, istio sidecar is injected.  ArgoCD can only be configured
with one domain unfortunately, it's not accessible at argocd.ois.run,
only argocd.k2.ois.run (or whatever cluster it's installed into).

Ideally it would use the Host header but it does not.

RBAC is not implemented but the User Info endpoint does have group
membership so this shouldn't be a problem to implement.
2024-04-02 15:33:47 -07:00
Jeff McCune
2e98df3572 (#86) ArgoCD in prod-platform project namespace
Deploys using the official release yaml.
2024-04-02 13:34:03 -07:00
Jeff McCune
3b561de413 (#93) Custom AuthPolicy rules for vault
This patch defines a #AuthPolicyRules struct which excludes hosts from
the blanket auth policy and includes them in specialized auth policies.
The purpose is to handle special cases like vault requests which have an
`X-Vault-Token` and `X-Vault-Request` header.

Vault does not use jwts so we cannot verify them in the mesh, have to
pass them along to the backend.

Closes: #93
2024-04-02 12:54:31 -07:00
Jeff McCune
0d0dae8742 (#89) Disable project auth proxies by default
Focus on the ingress gateway auth proxy for now and see how far it gets
us.
2024-04-01 21:48:08 -07:00
Jeff McCune
61b4b5bd17 (#89) Refactor auth proxy callbacks
The ingress gateway auth proxy callback conflicts with the project stage
auth proxy callback for the same backend Host: header value.

This patch disambiguates them by the namespace the auth proxy resides
in.
2024-04-01 21:37:52 -07:00
Jeff McCune
0060740b76 (#82) ingress gateway AuthorizationPolicy
This patch adds a `RequestAuthentication` and `AuthorizationPolicy` rule
to protect all requests flowing through the default ingress gateway.

Consider a browser request for httpbin.k2.example.com representing any
arbitrary host with a valid destination inside the service mesh.  The
default ingress gateway will check if there is already an
x-oidc-id-token header, and if so validate the token is issued by
ZITADEL and the aud value contains the ZITADEL project number.

If the header is not present, the request is forwarded to oauth2-proxy
in the istio-ingress namespace.  This auth proxy is configured to start
the oidc auth flow with a redirect back to /holos/oidc/callback of the
Host: value originally provided in the browser request.

Closes: #82
2024-04-01 20:37:34 -07:00
Jeff McCune
bf8a4af579 (#82) ingressgateway ExtAuthzHttp provider
This patch adds an ingress gateway extauthz provider.  Because ZITADEL
returns all applications associated with a ZITADEL project in the aud
claim, it makes sense to have one ingress auth proxy at the initial
ingress gateway so we can get the ID token in the request header for
backend namespaces to match using `RequestAuthentication` and
`AuthorizationPolicy`.

This change likely makes the additional per-stage auth proxies
unnecessary and over-engineered.  Backend namespaces will have access to
the ID token.
2024-04-01 16:53:11 -07:00
Jeff McCune
dc057fe39d (#89) Add platform project hosts for argocd, grafana, and prometheus
Certificates are issued by the provisioner and synced to the workload
clusters.
2024-04-01 13:09:46 -07:00
Jeff McCune
9877ab131a (#89) Platform Project
This patch manages a platform project to host platform level services
like ArgoCD, Kube Prom Stack, Kiali, etc...
2024-04-01 11:46:02 -07:00
Jeff McCune
13aba64cb7 (#66) Move CUSTOM AuthorizationPolicy to env namespace
It doesn't make sense to link the stage ext authz provider to the
ingress gateway because there can be only one provider per workload.

Link it instead to the backend environment and use the
`security.holos.run/authproxy` label to match the workload.
2024-03-31 18:56:14 -07:00
Jeff McCune
fe9bc2dbfc (#81) Istio 1.21.0 2024-03-31 12:51:56 -07:00
Jeff McCune
c53b682852 (#66) Use x-oidc-id-token instead of authorization header
Problem:
Backend services and web apps expect to place their own credentials into
the Authorization header.  oauth2-proxy writes over the authorization
header creating a conflict.

Solution:
Use the alpha configuration to place the id token into the
x-oidc-id-token header and configure the service mesh to authenticate
requests that have this header in place.

Note: ZITADEL does not use a JWT for an access token, unlike Keycloak
and Dex.  The access token is not compatible with a
RequestAuthentication jwt rule so we must use the id token.
2024-03-31 11:41:23 -07:00
Jeff McCune
3aca6a9e4c (#66) configure auth proxies to set Authorization: Bearer header
Without this patch the istio RequestAuthentication resources fail to
match because the access token from ZITADEL returned by oauth2-proxy in
the x-auth-request-access-token header is not a proper jwt.

The error is:

```
Jwt is not in the form of Header.Payload.Signature with two dots and 3 sections
```

This patch works around the problem by configuring oauth2-proxy to set
the ID token, which is guaranteed to be a proper JWT in the
authorization response headers.

Unfortunately, oauth2-proxy will only place the ID token in the
Authorization header response, which will write over any header set by a
client application.  This is likely to cause problems with single page
apps.

We'll probably need to work around this issue by using the alpha
configuration to set the id token in some out-of-the-way header.  We've
done this before, it'll just take some work to setup the ConfigMap and
translate the config again.
2024-03-30 16:15:27 -07:00
Jeff McCune
40fdfc0317 (#66) Fix auth proxy provider name, stage is always first
dev-holos-authproxy not authproxy-dev-holos
2024-03-30 14:05:50 -07:00
Jeff McCune
25d9415b0a (#66) Fix redis not able to write to /data
Without this patch redis cannot write to the /data directory, which
causes oauth2-proxy to fail with a 500 server error.
2024-03-30 13:40:34 -07:00
Jeff McCune
43c8702398 (#66) Configure an ExtAuthzProxy provider for each project stage
This patch configures an istio envoyExtAuthzHttp provider for each stage
in each project.  An example provider for the dev stage of the holos
project is `authproxy-dev-holos`
2024-03-30 11:28:23 -07:00
Jeff McCune
ce94776dbb (#66) Add ZITADEL project and client ids for iam project
core1 and core2 don't render without these resource identifiers in
place.
2024-03-30 09:18:54 -07:00
Jeff McCune
78ab6cd848 (#66) Match /holos/oidc for all hosts in the project stage
This has the same effect and makes the VirtualService much more
manageable, particularly when calling `kubectl get vs -A`.
2024-03-29 22:50:17 -07:00