Compare commits

...

82 Commits

Author SHA1 Message Date
Jeff McCune
4184619afc (#126) Refactor pkg to internal
pkg folder is not needed.  Move everything internal for now.
2024-04-12 13:56:16 -07:00
Jeff McCune
954dbd1ec8 (#126) Refactor id token acquisition to token package
And add a logout command that deletes the token cache.

The token package is intended for subcommands that need to make API
calls to the holos api server, getting a token should be a simple matter
of calling the token.Get() method, which takes minimal dependencies.
2024-04-12 13:15:03 -07:00
Jeff McCune
30b70e76aa (#126) Add login command
This copies the login command from the previous holos cli.  Wire
dependency injection and all the rest of the unnecessary stuff from
kubelogin are removed, streamlined down into a single function that
takes a few oidc related parameters.

This will need to be extracted out into an infrastructure service so
multiple other command line tools can easily re-use it and get the ID
token into the x-oidc-id-token header.
2024-04-12 12:13:33 -07:00
Jeff McCune
ec6d112711 (#126) Remove hydra and kratos databases
No longer needed for dev.
2024-04-12 10:24:26 -07:00
Jeff McCune
e796c6a763 (#126) Default to DATABASE_URL env var 2024-04-12 10:20:13 -07:00
Jeff McCune
be32201294 (#126) Basic User and Organization Ent models
Get rid of the previous UserIdentity model, this is no longer part of
the core domain and instead handled within the context of ZITADEL.
2024-04-12 09:59:40 -07:00
Jeff McCune
5ebc54b5b7 (#124) Go Tools 2024-04-12 09:14:13 -07:00
Jeff McCune
2954a57872 (#120) Fix NATS target namespace
The upstream nats charts don't specify namespaces for each attribute.
This works with helm update, but not helm template which holos uses to
render the yaml.

The missing namespace causes flux to fail.

This patch uses the flux kustomization to add the target namespace to
all resources.
2024-04-10 21:54:58 -07:00
Jeff McCune
df705bd79f (#121) Fix Multiple Charts cause holos render to fail
When rendering a holos component which contains more than one helm chart, rendering fails.  It should succeed.

```
holos render --cluster-name=k2 /home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/holos/... --log-level debug
```

```
9:03PM ERR could not execute version=0.64.2 err="could not rename: rename /home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/holos/nats/envs/vendor553679311 /home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/holos/nats/envs/vendor: file exists" loc=helm.go:145
```

This patch fixes the problem by moving each child item of the temporary
directory charts are installed into.  This avoids the problem of moving
the parent when the parent target already exists.
2024-04-10 21:27:39 -07:00
Jeff McCune
4e8ce3585d (#115) Minor clean up of cue code 2024-04-10 21:21:16 -07:00
Jeff McCune
ab5f17c3d2 (#115) Fix goreleaser
Import modules to take the direct dependency and prevent go mod tidy
from modifying go.mod and go.sum which causes goreleaser to fail.
2024-04-10 19:09:30 -07:00
Jeff McCune
a8918c74d4 (#115) Angular spike - fix make frontend
And install frontend deps.
2024-04-09 21:03:26 -07:00
Jeff McCune
ae5738d82d (#115) Angular with SSR
Executed:

    ng new
    ng add @angular/ssr

Name: holos
Style: CSS
SSR and SSG?: No

ssr added using ng add following https://angular.io/guide/prerendering
2024-04-09 20:52:42 -07:00
Jeff McCune
bb99aedffa (#115) Remove frontend
Clean up for ng new in angular spike.
2024-04-09 20:35:43 -07:00
Jeff McCune
d6ee1864c8 (#116) Tilt for development
Add Tilt back from holos server

Note with this patch the ec-creds.yaml file needs to be applied to the
provisioner and an external secret used to sync the image pull creds.

With this patch the dev instance is accessible behind the auth proxy.
pgAdmin also works from the Tilt UI.

https://jeff.holos.dev.k2.ois.run/app/start
2024-04-09 20:26:37 -07:00
Jeff McCune
8a4be66277 (#113) Fix goreleaser try 4
Please check in your pipeline what can be changing the following files:
  M go.sum
2024-04-09 16:48:21 -07:00
Jeff McCune
79ce2f8458 (#113) Fix goreleaser try 3 2024-04-09 16:35:38 -07:00
Jeff McCune
3d4ae44ddd (#113) Fix goreleaser try 2
goreleaser fails with Failure: plugin connect-query: could not find protoc plugin for name connect-query - please make sure protoc-gen-connect-query is installed and present on your $PATH
2024-04-09 16:23:35 -07:00
Jeff McCune
1efb1faa40 (#113) Fix goreleaser
git executable must come before actions checkout
2024-04-09 16:04:42 -07:00
Jeff McCune
bfd6a56397 (#113) Fix actions workflows 2024-04-09 15:57:31 -07:00
Jeff McCune
a788f6d8e8 (#112) Refactor config flag handling
Remove the server.Config struct, not needed.  Remove the app struct and
move the configuration to the main holos.Config.ServerConfig.

Add flags specific to server configuration.

With this patch logging is simplified.  Subcommands have a handle on the
top level holos.Config and can get a fully configured logger from
cfg.Logger() after flag parsing happens.
2024-04-09 11:42:24 -07:00
Jeff McCune
80fa91d74d (#112) Rename wrapper package to errors
The wrapper package name doesn't indicate what it's for.  Rename to
errors and delegate to the standard library.
2024-04-08 20:53:58 -07:00
Jeff McCune
db34562e9a (#112) Get tests passing 2024-04-08 20:53:57 -07:00
Jeff McCune
d6af089ab3 (#112) Rename package core to app
Disambiguate the term `core` which should mean the core domain.  The app
is a supporting domain concerned with logging and configuration
initialization early in the life cycle.
2024-04-08 20:53:57 -07:00
Jeff McCune
b3a70c5911 (#112) Copy holos-server to holos server subcommand
From holos-server commit da35fe966ded2098fe069293ec30864775a6c4f0

Compiles but needs cleanup
2024-04-08 20:53:25 -07:00
Jeff McCune
bf5765c9cb (#110) Update ZITADEL to v2.49.1 from v2.46.0
Attempt to resolve issue where `/oauth/v2/keys` returns `{"keys": []}`
causing id token verification failures.

Closes: #110
2024-04-07 17:20:10 -07:00
Jeff McCune
6c7697648c (#110) Add runbook to take a full database backup
This runbook documents how to write a full database backup to a blank S3
bucket given an existing postgrescluster resource with a live, running
database.

The pgo controller needs to remove and re-create the repo for the backup
to succeed, otherwise it complains about a missing file expected from a
previous backup.
2024-04-07 17:20:07 -07:00
Jeff McCune
04158485c7 (#96) Do not expire ZITADEL signing public key
The public key needs to be configured along with the signing key.
2024-04-05 10:52:36 -07:00
Jeff McCune
cf83c77280 (#96) Do not expire ZITADEL signing private key
Without this patch users encounter an error from istio because it does
not have a valid Jwks from ZITADEL to verify the request when processing
a `RequestAuthentication` policy.

Fixes error `AuthProxy JWKS Error - Jwks doesn't have key to match kid or alg from Jwt`.

Occurs when accessing a protected URL for the first time after tokens have expired.
2024-04-04 15:56:00 -07:00
Jeff McCune
6e545b13dd (#104) Deploy crunchy monitoring stack for ZITADEL
Not exposed via the ingress gateway, but accessible via

    kubectl port-forward svc/crunchy-grafana 3000

Refer to [day two monitoring][1].  This is pretty much a straight copy
of the upstream kustomize configs at [2].

[1]: https://access.crunchydata.com/documentation/postgres-operator/5.5/tutorials/day-two/monitoring
[2]: https://github.com/CrunchyData/postgres-operator-examples/tree/main/kustomize/monitoring
2024-04-04 15:40:07 -07:00
Jeff McCune
bf258a1f41 (#104) Enable monitoring for ZITADEL postgres
This patch enables the monitoring configuration for the ZITADEL postgres
cluster.

Refer to: https://access.crunchydata.com/documentation/postgres-operator/5.5/tutorials/day-two/monitoring

Integrating with:
https://github.com/CrunchyData/postgres-operator-examples/tree/main/kustomize/monitoring
which will become a separate holos component instance.
2024-04-03 22:26:38 -07:00
Jeff McCune
6f06c73d6f (#85) Initial addition of kube-prometheus-stack
Grafana does not yet have the istio sidecar.  Prometheus is accessible
through the auth proxy.  Cert manager is added to the workload clusters
so tls certs can be issued for webhooks, the kube-prom-stack helm chart
uses cert manager for this purpose.

With this patch Grafana is integrated with OIDC and I'm able to log in
as an Administrator.
2024-04-03 21:29:26 -07:00
Jeff McCune
a689c53a9c (#47) v0.62.1 - Projects v1alpha1 milestone complete 2024-04-03 15:32:34 -07:00
Jeff McCune
58cdda1d35 Merge pull request #100 from holos-run/jeff/47-iam-v2
(#47) Remove the prod-iam-zitadel namespace
2024-04-03 15:23:48 -07:00
Jeff McCune
bcb02b5c5c (#47) Remove the prod-iam-zitadel namespace
No longer needed, cluster has moved to prod-iam namespace.
2024-04-03 15:10:30 -07:00
Jeff McCune
0736c7de1a (#47) Bind ALL VirtualServices to the default gateway
Problem:
The VirtualService that catches auth routes for paths, e.g.
`/holos/authproxy/istio-ingress` is bound to the default gateway which
no longer exists because it has no hosts.

Solution:
It's unnecessary and complicated to create a Gateway for every project.
Instead, put all server entries into one `default` gateway and
consolidate the list using CUE.

Result:
It's easier to reason about this system.  There is only one ingress
gateway, `default` and everything gets added to it.  VirtualServices
need only bind to this gateway, which has a hosts entry appropriately
namespaced for the project.
2024-04-03 14:56:40 -07:00
Jeff McCune
28be9f9fbb (#47) Use the project specific Gateway
The login service is unavailable because the wrong gateway is used.
When using projects the VS needs to attach to the correct Gateway.
2024-04-03 12:59:48 -07:00
Jeff McCune
647681de38 (#99) Restore backups from prod-iam namespace
This patch configures the standby cluster to restore backups from the
prod-iam namespace instead of the prod-iam-zitadel namespace.
2024-04-03 12:30:12 -07:00
Jeff McCune
81beb5c539 (#47) Restore ZITADEL from existing backups
Problem:
The ZITADEL database isn't restoring into the prod-iam namespace after
moving from prod-iam-zitadel because no backup exists at the bucket
path.

Solution:
Hard-code the path to the old namespace to restore the database.  We'll
figure out how to move the backups to the new location in a follow up
change.
2024-04-03 11:44:16 -07:00
Jeff McCune
5c1e0a29c8 (#47) Have Ceph depend on secret stores
Another kustomization reconciling too early.
2024-04-03 11:22:15 -07:00
Jeff McCune
01ac5276a9 (#47) Have Gateway depend on secret stores
The `prod-platform-gateway` kustomization is reconciling early:

ExternalSecret/istio-ingress/argocd.ois.run dry-run failed: failed to
get API group resources: unable to retrieve the complete list of server
APIs: external-secrets.io/v1beta1: the server could not find the
requested resource
2024-04-03 11:20:15 -07:00
Jeff McCune
e40594ad8e (#47) Move ZITADEL to prod-iam project namespace
This patch moves ZITADEL from the prod-iam-zitadel namespace to the
projects managed prod-iam namespace, which is the prod environment of
the prod stage of the iam project.
2024-04-03 11:06:55 -07:00
Jeff McCune
bc9c6a622a (#97) Increase ZITADEL pgdata volume to 20Gi
Problem:

```
❯ k exec zitadel-pgha1-4npq-0 -it -- bash
Defaulted container "database" out of: database, replication-cert-copy, pgbackrest, pgbackrest-config, postgres-startup (init), nss-wrapper-init (init)
bash-4.4$ df -h
Filesystem      Size  Used Avail Use% Mounted on
overlay         119G   51G   68G  43% /
tmpfs            64M     0   64M   0% /dev
/dev/rbd3       9.8G  9.8G     0 100% /pgdata
/dev/sda6       119G   51G   68G  43% /tmp
tmpfs            16G   24K   16G   1% /pgconf/tls
tmpfs            16G   24K   16G   1% /etc/database-containerinfo
tmpfs            16G   16K   16G   1% /etc/patroni
tmpfs            16G     0   16G   0% /dev/shm
tmpfs            16G   28K   16G   1% /etc/pgbackrest/conf.d
tmpfs            16G   12K   16G   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs           7.9G     0  7.9G   0% /proc/acpi
tmpfs           7.9G     0  7.9G   0% /proc/scsi
tmpfs           7.9G     0  7.9G   0% /sys/firmware
```
2024-04-03 10:09:49 -07:00
Jeff McCune
17f22199b7 (#86) ArgoCD - Disable Dex
Not needed
2024-04-02 15:47:22 -07:00
Jeff McCune
7e93fe4535 (#86) ArgoCD
Using the Helm chart so we can inject the istio sidecar with a kustomize
patch and tweak the configs for OIDC integration.

Login works, istio sidecar is injected.  ArgoCD can only be configured
with one domain unfortunately, it's not accessible at argocd.ois.run,
only argocd.k2.ois.run (or whatever cluster it's installed into).

Ideally it would use the Host header but it does not.

RBAC is not implemented but the User Info endpoint does have group
membership so this shouldn't be a problem to implement.
2024-04-02 15:33:47 -07:00
Jeff McCune
2e98df3572 (#86) ArgoCD in prod-platform project namespace
Deploys using the official release yaml.
2024-04-02 13:34:03 -07:00
Jeff McCune
3b561de413 (#93) Custom AuthPolicy rules for vault
This patch defines a #AuthPolicyRules struct which excludes hosts from
the blanket auth policy and includes them in specialized auth policies.
The purpose is to handle special cases like vault requests which have an
`X-Vault-Token` and `X-Vault-Request` header.

Vault does not use jwts so we cannot verify them in the mesh, have to
pass them along to the backend.

Closes: #93
2024-04-02 12:54:31 -07:00
Jeff McCune
0d0dae8742 (#89) Disable project auth proxies by default
Focus on the ingress gateway auth proxy for now and see how far it gets
us.
2024-04-01 21:48:08 -07:00
Jeff McCune
61b4b5bd17 (#89) Refactor auth proxy callbacks
The ingress gateway auth proxy callback conflicts with the project stage
auth proxy callback for the same backend Host: header value.

This patch disambiguates them by the namespace the auth proxy resides
in.
2024-04-01 21:37:52 -07:00
Jeff McCune
0060740b76 (#82) ingress gateway AuthorizationPolicy
This patch adds a `RequestAuthentication` and `AuthorizationPolicy` rule
to protect all requests flowing through the default ingress gateway.

Consider a browser request for httpbin.k2.example.com representing any
arbitrary host with a valid destination inside the service mesh.  The
default ingress gateway will check if there is already an
x-oidc-id-token header, and if so validate the token is issued by
ZITADEL and the aud value contains the ZITADEL project number.

If the header is not present, the request is forwarded to oauth2-proxy
in the istio-ingress namespace.  This auth proxy is configured to start
the oidc auth flow with a redirect back to /holos/oidc/callback of the
Host: value originally provided in the browser request.

Closes: #82
2024-04-01 20:37:34 -07:00
Jeff McCune
bf8a4af579 (#82) ingressgateway ExtAuthzHttp provider
This patch adds an ingress gateway extauthz provider.  Because ZITADEL
returns all applications associated with a ZITADEL project in the aud
claim, it makes sense to have one ingress auth proxy at the initial
ingress gateway so we can get the ID token in the request header for
backend namespaces to match using `RequestAuthentication` and
`AuthorizationPolicy`.

This change likely makes the additional per-stage auth proxies
unnecessary and over-engineered.  Backend namespaces will have access to
the ID token.
2024-04-01 16:53:11 -07:00
Jeff McCune
dc057fe39d (#89) Add platform project hosts for argocd, grafana, and prometheus
Certificates are issued by the provisioner and synced to the workload
clusters.
2024-04-01 13:09:46 -07:00
Jeff McCune
9877ab131a (#89) Platform Project
This patch manages a platform project to host platform level services
like ArgoCD, Kube Prom Stack, Kiali, etc...
2024-04-01 11:46:02 -07:00
Jeff McCune
13aba64cb7 (#66) Move CUSTOM AuthorizationPolicy to env namespace
It doesn't make sense to link the stage ext authz provider to the
ingress gateway because there can be only one provider per workload.

Link it instead to the backend environment and use the
`security.holos.run/authproxy` label to match the workload.
2024-03-31 18:56:14 -07:00
Jeff McCune
fe9bc2dbfc (#81) Istio 1.21.0 2024-03-31 12:51:56 -07:00
Jeff McCune
c53b682852 (#66) Use x-oidc-id-token instead of authorization header
Problem:
Backend services and web apps expect to place their own credentials into
the Authorization header.  oauth2-proxy writes over the authorization
header creating a conflict.

Solution:
Use the alpha configuration to place the id token into the
x-oidc-id-token header and configure the service mesh to authenticate
requests that have this header in place.

Note: ZITADEL does not use a JWT for an access token, unlike Keycloak
and Dex.  The access token is not compatible with a
RequestAuthentication jwt rule so we must use the id token.
2024-03-31 11:41:23 -07:00
Jeff McCune
3aca6a9e4c (#66) configure auth proxies to set Authorization: Bearer header
Without this patch the istio RequestAuthentication resources fail to
match because the access token from ZITADEL returned by oauth2-proxy in
the x-auth-request-access-token header is not a proper jwt.

The error is:

```
Jwt is not in the form of Header.Payload.Signature with two dots and 3 sections
```

This patch works around the problem by configuring oauth2-proxy to set
the ID token, which is guaranteed to be a proper JWT in the
authorization response headers.

Unfortunately, oauth2-proxy will only place the ID token in the
Authorization header response, which will write over any header set by a
client application.  This is likely to cause problems with single page
apps.

We'll probably need to work around this issue by using the alpha
configuration to set the id token in some out-of-the-way header.  We've
done this before, it'll just take some work to setup the ConfigMap and
translate the config again.
2024-03-30 16:15:27 -07:00
Jeff McCune
40fdfc0317 (#66) Fix auth proxy provider name, stage is always first
dev-holos-authproxy not authproxy-dev-holos
2024-03-30 14:05:50 -07:00
Jeff McCune
25d9415b0a (#66) Fix redis not able to write to /data
Without this patch redis cannot write to the /data directory, which
causes oauth2-proxy to fail with a 500 server error.
2024-03-30 13:40:34 -07:00
Jeff McCune
43c8702398 (#66) Configure an ExtAuthzProxy provider for each project stage
This patch configures an istio envoyExtAuthzHttp provider for each stage
in each project.  An example provider for the dev stage of the holos
project is `authproxy-dev-holos`
2024-03-30 11:28:23 -07:00
Jeff McCune
ce94776dbb (#66) Add ZITADEL project and client ids for iam project
core1 and core2 don't render without these resource identifiers in
place.
2024-03-30 09:18:54 -07:00
Jeff McCune
78ab6cd848 (#66) Match /holos/oidc for all hosts in the project stage
This has the same effect and makes the VirtualService much more
manageable, particularly when calling `kubectl get vs -A`.
2024-03-29 22:50:17 -07:00
Jeff McCune
0a7001f868 (#66) Configure the primary domain for zitadel
This bypasses the account selection screen and automatically redirects
back to the application without user interaction.
2024-03-29 22:44:52 -07:00
Jeff McCune
2db7be671b (#66) Route prefix /holos/oidc to authproxy
This patch configures the service mesh to route all requests with a uri
path prefix of `/holos/oidc` to the auth proxy associated with the
project stage.

Consider a request to https://jeff.holos.dev.k2.ois.run/holos/oidc/sign_in

This request is usually routed to the backend app, but
VirtualService/authproxy in the dev-holos-system namespace matches the
request and routes it to the auth proxy instead.

The auth proxy matches the request Host: header against the whitelist
and cookiedomain setting, which matches the suffix
`.holos.dev.k2.ois.run`.  The auth proxy redirects to the oidc issuer
with a callback url of the request Host for a url of
`https://jeff.holos.dev.k2.ois.run/holos/oidc/callback`.

ZITADEL matches the callback against those registered with the app and
the app client id.  A code is then sent back to the auth proxy.

The auth proxy sets a cookie named `__Secure-authproxy-dev-holos` with a
domain of `.holos.dev.k2.ois.run` from the suffix match of the
`--cookiedomain` flag.

Because this all works using paths, the `auth` prefix domains have been
removed.  They're unnecessary, oauth2-proxy is available for any host
routed to the project stage at path prefix `/holos/oidc`.

Refer to https://oauth2-proxy.github.io/oauth2-proxy/features/endpoints/
for good endpoints for debuggin, replacing `/oauth2` with `/holos/oidc`
2024-03-29 21:56:46 -07:00
Jeff McCune
b51870f7bf (#66) Deploy oauth2-proxy and redis to stage namespaces
This patch deploys oauth2-proxy and redis to the system namespace of
each stage in each project.  The plan is to redirect unauthenticated
requests to the request host at the /holos/oidc/callback endpoint.

This patch removes the --redirect-uri flag, which makes the auth domain
prefix moot, so a future patch should remove those if they really are
unnecessary.

The reason to remove the --redirect-uri flag is to make sure we set the
cookie to a domain suffix of the request Host: header.
2024-03-29 20:56:26 -07:00
Jeff McCune
0227dfa7e5 (#66) Add Gateway entries for oauth2-proxy
This patch adds entries to the project stage Gateway for oauth2-proxy.
Three entries for each stage are added, one for the global endpoint plus
one for each cluster.
2024-03-29 15:30:02 -07:00
Jeff McCune
05b59d9af0 (#66) Refactor project hosts for auth proxy cookies
Without this patch the auth proxy cookie domain is difficult to manage.
This patch refactors the hosts managed for each environment in a project
to better align with security domains and auth proxy session cookies.

The convention is: `<env?>.<host>.<stage?>.<cluster?>.<domain>` where
`host` can be 0..N entries with a default value of `[projectName]`.

env may be omitted for prod or the dev env of the dev stage.  stage may
be omitted for prod.  cluster may be omitted for the global endpoint.

For a project named `holos`:

| Project | Stage | Env  | Cluster | Host                      |
| ------- | ----- | ---  | ------- | ------                    |
| holos   | dev   | jeff | k2      | jeff.holos.dev.k2.ois.run |
| holos   | dev   | jeff | global  | jeff.holos.dev.ois.run    |
| holos   | dev   | -    | k2      | holos.dev.k2.ois.run      |
| holos   | dev   | -    | global  | holos.dev.ois.run         |
| holos   | prod  | -    | k2      | holos.k2.ois.run          |
| holos   | prod  | -    | global  | holos.ois.run             |

Auth proxy:

| Project | Stage | Auth Proxy Host           | Auth Cookie Domain   |
| ------- | ----- | ------                    | ------------------   |
| holos   | dev   | auth.holos.dev.ois.run    | holos.dev.ois.run    |
| holos   | dev   | auth.holos.dev.k1.ois.run | holos.dev.k1.ois.run |
| holos   | dev   | auth.holos.dev.k2.ois.run | holos.dev.k2.ois.run |
| holos   | prod  | auth.holos.ois.run        | holos.ois.run        |
| holos   | prod  | auth.holos.k1.ois.run     | holos.k1.ois.run     |
| holos   | prod  | auth.holos.k2.ois.run     | holos.k2.ois.run     |
2024-03-29 15:30:01 -07:00
Jeff McCune
04f9f3b3a8 Merge pull request #79 from holos-run/nate/makefile_version
Show the holos version in 'make install|build'
2024-03-29 15:04:48 -07:00
Nate McCurdy
b58be8b38c Show the holos version in 'make install|build'
Prior to this, when running the 'install' or 'build' Makefile target,
the version of holos being built was not shown even though the 'build'
target attempted to show the version.

```
.PHONY: build
build: generate ## Build holos executable.
	@echo "building ${BIN_NAME} ${VERSION}"
```

For example:
```
> make install
go generate ./...
building holos
...
```

Holo's version is stored in pkg/version/embedded/{major,minor,patch},
not the `Version` const. So the fix is to change the value of `VERSION`
so that it comes from those embedded files.

Now the version of holos is shown:

```
> make install
go generate ./...
building holos 0.61.1
...
```

This also adds a new Makefile target called `show-version` which shows
the full version string (i.e. the value of `$VERSION`).
2024-03-29 15:01:33 -07:00
Jeff McCune
10493d754a (#66) Add httpbin to each project environment
The goal of this patch is to verify each project environment is wired up
to the ingress Gateway for the project stage.

This is a necessary step to eventually configure the VirtualService and
AuthorizationPolicy to only match on the `/dump/request` path of each
endpoint for troubleshooting.
2024-03-28 21:51:34 -07:00
Jeff McCune
cf28516b8b (#66) Project managed namespaces
This patch uses the existing #ManagedNamespaces definition to create and
manage namespaces on the provisioner and workload clusters so that
SecretStore and eso-creds-refresher resources are managed in the project
environment namespaces and the project stage system namespace.
2024-03-28 15:09:57 -07:00
Jeff McCune
d81e25c4e4 (#66) Project Certificates
Provisioner cluster:

This patch creates a Certificate resource in the provisioner for each
host associated with the project.  By default, one host is created for
each stage with the short hostname set to the project name.

A namespace is also created for each project for eso creds refresher to
manage service accounts for SecretStore resources in the workload
clusters.

Workload cluster:

For each env, plus one system namespace per stage:

 - Namespace per env
 - SecretStore per env
 - ExternalSecret per host in the env

Common names for the holos project, prod stage:

- holos.k1.ois.run
- holos.k2.ois.run
- holos.ois.run

Common names for the holos project, dev stage:

- holos.dev.k1.ois.run
- holos.dev.k2.ois.run
- holos.dev.ois.run
- holos.gary.k1.ois.run
- holos.gary.k2.ois.run
- holos.gary.ois.run
- holos.jeff.k1.ois.run
- holos.jeff.k2.ois.run
- holos.jeff.ois.run
- holos.nate.k1.ois.run
- holos.nate.k2.ois.run
- holos.nate.ois.run

Usage:

    holos render --cluster-name=provisioner \
      ~/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/provisioner/projects/...
    holos render --cluster-name=k1 \
      ~/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/workload/projects/...
    holos render --cluster-name=k2 \
      ~/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/workload/projects/...
2024-03-27 20:54:51 -07:00
Jeff McCune
c4612ff5d2 (#64) Manage one system namespace per project
This patch introduces a new BuildPlan spec.components.resources
collection, which is a map version of
spec.components.kubernetesObjectsList.  The map version is much easier
to work with and produce in CUE than the list version.

The list version should be deprecated and removed prior to public
release.

The projects holos instance renders multiple holos components, each
containing kubernetes api objects defined directly in CUE.

<project>-system is intended for the ext auth proxy providers for all
stages.

<project>-namespaces is intended to create a namespace for each
environment in the project.

The intent is to expand the platform level definition of a project to
include the per-stage auth proxy and per-env role bindings.  Secret
Store and ESO creds refresher resources will also be defined by the
platform level definition of a project.
2024-03-26 12:23:01 -07:00
Jeff McCune
d70acbb47e ignore .vscode 2024-03-22 21:22:06 -07:00
Jeff McCune
3c977d22fe (#71) Final refactoring of example code to use BuildPlan
Need to test it on all the clusters now.  Will follow up with any
necessary fixes.
2024-03-22 16:58:52 -07:00
Jeff McCune
e34db2b583 (#71) Refactor provisioner to produce a BuildPlan 2024-03-22 16:42:57 -07:00
Jeff McCune
71de57ac88 (#71) Refactor optional vault service to BuildPlan 2024-03-22 15:54:52 -07:00
Jeff McCune
c7cc661018 (#71) Refactor Zitadel components for BuildPlan
❯ holos render --cluster-name k2  ~/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/accounts/iam/zitadel/...
3:04PM INF render.go:43 rendered prod-iam-postgres version=0.60.2 status=ok action=rendered name=prod-iam-postgres
3:04PM INF render.go:43 rendered prod-iam-postgres-certs version=0.60.2 status=ok action=rendered name=prod-iam-postgres-certs
3:04PM INF render.go:43 rendered prod-iam-zitadel version=0.60.2 status=ok action=rendered name=prod-iam-zitadel
2024-03-22 15:04:43 -07:00
Jeff McCune
09f39c02fe (#71) Refactor foundation/cloud/secrets components to BuildPlan 2024-03-22 13:50:34 -07:00
Jeff McCune
23c76a73e0 (#71) Refactor pgo components to BuildPlan 2024-03-22 13:29:38 -07:00
Jeff McCune
1cafe08237 (#71) Refactor prod-metal-ceph to use BuildPlan 2024-03-22 12:44:20 -07:00
Jeff McCune
45b07964ef (#71) Refactor the mesh collection to use BuildPlan
This patch refactors the example reference platform to use the new
BuildPlan API.

```
❯ holos render --cluster-name=k2 /home/jeff/workspace/holos-run/holos/docs/examples/platforms/reference/clusters/foundation/cloud/mesh/...
12:19PM INF render.go:43 rendered prod-mesh-cni version=0.60.2 status=ok action=rendered name=prod-mesh-cni
12:19PM INF render.go:43 rendered prod-mesh-gateway version=0.60.2 status=ok action=rendered name=prod-mesh-gateway
12:19PM INF render.go:43 rendered prod-mesh-httpbin version=0.60.2 status=ok action=rendered name=prod-mesh-httpbin
12:19PM INF render.go:43 rendered prod-mesh-ingress version=0.60.2 status=ok action=rendered name=prod-mesh-ingress
12:19PM INF render.go:43 rendered prod-mesh-istiod version=0.60.2 status=ok action=rendered name=prod-mesh-istiod
12:19PM INF render.go:43 rendered prod-mesh-istio-base version=0.60.2 status=ok action=rendered name=prod-mesh-istio-base
```
2024-03-22 12:44:20 -07:00
297 changed files with 89311 additions and 915 deletions

View File

@@ -17,11 +17,31 @@ jobs:
name: lint
runs-on: gha-rs
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Node
uses: actions/setup-node@v4
with:
node-version: 20
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version: stable
cache: false
- name: Install Packages
run: sudo apt update && sudo apt -qq -y install git curl zip unzip tar bzip2 make
- name: Install Tools
run: |
set -x
make tools
make buf
go generate ./...
make frontend
go mod tidy
- name: golangci-lint
uses: golangci/golangci-lint-action@v4
with:

View File

@@ -14,23 +14,46 @@ jobs:
goreleaser:
runs-on: gha-rs
steps:
# Must come before Checkout, otherwise goreleaser fails
- name: Provide GPG and Git
run: sudo apt update && sudo apt -qq -y install gnupg git
run: sudo apt update && sudo apt -qq -y install gnupg git curl zip unzip tar bzip2 make
# Must come after git executable is provided
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Set up Node
uses: actions/setup-node@v4
with:
node-version: 20
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version: stable
# Necessary to run these outside of goreleaser, otherwise
# /home/runner/_work/holos/holos/internal/frontend/node_modules/.bin/protoc-gen-connect-query is not in PATH
- name: Install Tools
run: |
set -x
make tools
make buf
go generate ./...
make frontend
go mod tidy
- name: Import GPG key
uses: crazy-max/ghaction-import-gpg@v6
with:
gpg_private_key: ${{ secrets.GPG_CODE_SIGNING_SECRETKEY }}
passphrase: ${{ secrets.GPG_CODE_SIGNING_PASSPHRASE }}
- name: List keys
run: gpg -K
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version: stable
- name: Run GoReleaser
uses: goreleaser/goreleaser-action@v5
with:

View File

@@ -18,13 +18,18 @@ jobs:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Node
uses: actions/setup-node@v4
with:
node-version: 20
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version: stable
- name: Provide unzip for Helm
run: sudo apt update && sudo apt -qq -y install curl zip unzip tar bzip2
- name: Install Packages
run: sudo apt update && sudo apt -qq -y install git curl zip unzip tar bzip2 make
- name: Set up Helm
uses: azure/setup-helm@v4
@@ -32,5 +37,14 @@ jobs:
- name: Set up Kubectl
uses: azure/setup-kubectl@v3
- name: Install Tools
run: |
set -x
make tools
make buf
go generate ./...
make frontend
go mod tidy
- name: Test
run: ./scripts/test

5
.gitignore vendored
View File

@@ -1,7 +1,8 @@
bin/
/bin/
vendor/
.idea/
coverage.out
dist/
/dist/
*.hold/
/deploy/
.vscode/

View File

@@ -10,10 +10,8 @@ version: 1
before:
hooks:
# You may remove this if you don't use go modules.
- go mod tidy
# you may remove this if you don't need go generate
- go generate ./...
- go mod tidy
builds:
- main: ./cmd/holos
@@ -23,6 +21,9 @@ builds:
- linux
- windows
- darwin
goarch:
- amd64
- arm64
signs:
- artifacts: checksum

View File

@@ -4,7 +4,7 @@ PROJ=holos
ORG_PATH=github.com/holos-run
REPO_PATH=$(ORG_PATH)/$(PROJ)
VERSION := $(shell grep "const Version " pkg/version/version.go | sed -E 's/.*"(.+)"$$/\1/')
VERSION := $(shell cat version/embedded/major version/embedded/minor version/embedded/patch | xargs printf "%s.%s.%s")
BIN_NAME := holos
DOCKER_REPO=quay.io/openinfrastructure/holos
@@ -12,11 +12,14 @@ IMAGE_NAME=$(DOCKER_REPO)
$( shell mkdir -p bin)
# For buf plugin protoc-gen-connect-es
export PATH := $(PWD)/internal/frontend/holos/node_modules/.bin:$(PATH)
GIT_COMMIT=$(shell git rev-parse HEAD)
GIT_TREE_STATE=$(shell test -n "`git status --porcelain`" && echo "dirty" || echo "clean")
BUILD_DATE=$(shell date -Iseconds)
LD_FLAGS="-w -X ${ORG_PATH}/${PROJ}/pkg/version.GitCommit=${GIT_COMMIT} -X ${ORG_PATH}/${PROJ}/pkg/version.GitTreeState=${GIT_TREE_STATE} -X ${ORG_PATH}/${PROJ}/pkg/version.BuildDate=${BUILD_DATE}"
LD_FLAGS="-w -X ${ORG_PATH}/${PROJ}/version.GitCommit=${GIT_COMMIT} -X ${ORG_PATH}/${PROJ}/version.GitTreeState=${GIT_TREE_STATE} -X ${ORG_PATH}/${PROJ}/version.BuildDate=${BUILD_DATE}"
.PHONY: default
default: test
@@ -39,6 +42,10 @@ bumpmajor: ## Bump the major version.
scripts/bump minor 0
scripts/bump patch 0
.PHONY: show-version
show-version: ## Print the full version.
@echo $(VERSION)
.PHONY: tidy
tidy: ## Tidy go module.
go mod tidy
@@ -90,6 +97,40 @@ coverage: test ## Test coverage profile.
snapshot: ## Go release snapshot
goreleaser release --snapshot --clean
.PHONY: buf
buf: ## buf generate
cd service && buf mod update
buf generate
.PHONY: tools
tools: go-deps frontend-deps ## install tool dependencies
.PHONY: go-deps
go-deps: ## tool versions pinned in tools.go
go install github.com/bufbuild/buf/cmd/buf
go install github.com/fullstorydev/grpcurl/cmd/grpcurl
go install google.golang.org/protobuf/cmd/protoc-gen-go
go install connectrpc.com/connect/cmd/protoc-gen-connect-go
go install honnef.co/go/tools/cmd/staticcheck@latest
# curl https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | bash
.PHONY: frontend-deps
frontend-deps: ## Setup npm and vite
cd internal/frontend/holos && npm install
cd internal/frontend/holos && npm install --save-dev @bufbuild/buf @connectrpc/protoc-gen-connect-es
cd internal/frontend/holos && npm install @connectrpc/connect @connectrpc/connect-web @bufbuild/protobuf
# https://github.com/connectrpc/connect-query-es/blob/1350b6f07b6aead81793917954bdb1cc3ce09df9/packages/protoc-gen-connect-query/README.md?plain=1#L23
cd internal/frontend/holos && npm install --save-dev @connectrpc/protoc-gen-connect-query @bufbuild/protoc-gen-es
cd internal/frontend/holos && npm install @connectrpc/connect-query @bufbuild/protobuf
.PHONY: frontend
frontend: buf
mkdir -p internal/frontend/holos/dist
cd internal/frontend/holos/dist && rm -rf app
cd internal/frontend/holos && ng build
touch internal/frontend/frontend.go
.PHONY: help
help: ## Display this help menu.
@awk 'BEGIN {FS = ":.*##"; printf "\nUsage:\n make \033[36m<target>\033[0m\n"} /^[a-zA-Z_0-9-]+:.*?##/ { printf " \033[36m%-20s\033[0m %s\n", $$1, $$2 } /^##@/ { printf "\n\033[1m%s\033[0m\n", substr($$0, 5) } ' $(MAKEFILE_LIST)

315
Tiltfile Normal file
View File

@@ -0,0 +1,315 @@
# -*- mode: Python -*-
# This Tiltfile manages a Go project with live leload in Kubernetes
listen_port = 3000
metrics_port = 9090
# Use our wrapper to set the kube namespace
if os.getenv('TILT_WRAPPER') != '1':
fail("could not run, ./hack/tilt/bin/tilt was not used to start tilt")
# AWS Account to work in
aws_account = '271053619184'
aws_region = 'us-east-2'
# Resource ids
holos_backend = 'Holos Backend'
pg_admin = 'pgAdmin'
pg_cluster = 'PostgresCluster'
pg_svc = 'Database Pod'
compile_id = 'Go Build'
auth_id = 'Auth Policy'
lint_id = 'Run Linters'
tests_id = 'Run Tests'
# PostgresCluster resource name in k8s
pg_cluster_name = 'holos'
# Database name inside the PostgresCluster
pg_database_name = 'holos'
# PGAdmin name
pg_admin_name = 'pgadmin'
# Default Registry.
# See: https://github.com/tilt-dev/tilt.build/blob/master/docs/choosing_clusters.md#manual-configuration
# Note, Tilt will append the image name to the registry uri path
default_registry('{account}.dkr.ecr.{region}.amazonaws.com/holos-run/holos-server'.format(account=aws_account, region=aws_region))
# Set a name prefix specific to the user. Multiple developers share the tilt-holos namespace.
developer = os.getenv('USER')
holos_server = 'holos'
# See ./hack/tilt/bin/tilt
namespace = os.getenv('NAMESPACE')
# We always develop against the k1 cluster.
os.putenv('KUBECONFIG', os.path.abspath('./hack/tilt/kubeconfig'))
# The context defined in ./hack/tilt/kubeconfig
allow_k8s_contexts('sso@k1')
allow_k8s_contexts('sso@k2')
allow_k8s_contexts('sso@k3')
allow_k8s_contexts('sso@k4')
allow_k8s_contexts('sso@k5')
# PG db connection for localhost -> k8s port-forward
os.putenv('PGHOST', 'localhost')
os.putenv('PGPORT', '15432')
# We always develop in the dev aws account.
os.putenv('AWS_CONFIG_FILE', os.path.abspath('./hack/tilt/aws.config'))
os.putenv('AWS_ACCOUNT', aws_account)
os.putenv('AWS_DEFAULT_REGION', aws_region)
os.putenv('AWS_PROFILE', 'dev-holos')
os.putenv('AWS_SDK_LOAD_CONFIG', '1')
# Authenticate to AWS ECR when tilt up is run by the developer
local_resource('AWS Credentials', './hack/tilt/aws-login.sh', auto_init=True)
# Extensions are open-source, pre-packaged functions that extend Tilt
#
# More info: https://github.com/tilt-dev/tilt-extensions
# More info: https://docs.tilt.dev/extensions.html
load('ext://restart_process', 'docker_build_with_restart')
load('ext://k8s_attach', 'k8s_attach')
load('ext://git_resource', 'git_checkout')
load('ext://uibutton', 'cmd_button')
# Paths edited by the developer Tilt watches to trigger compilation.
# Generated files should be excluded to avoid an infinite build loop.
developer_paths = [
'./cmd',
'./internal/server',
'./internal/ent/schema',
'./frontend/package-lock.json',
'./frontend/src',
'./go.mod',
'./pkg',
'./service/holos',
]
# Builds the holos-server executable
local_resource(compile_id, 'make build', deps=developer_paths)
# Build Docker image
# Tilt will automatically associate image builds with the resource(s)
# that reference them (e.g. via Kubernetes or Docker Compose YAML).
#
# More info: https://docs.tilt.dev/api.html#api.docker_build
#
docker_build_with_restart(
'holos',
context='.',
entrypoint=[
'/app/bin/holos',
'server',
'--listen-port={}'.format(listen_port),
'--oidc-issuer=https://login.ois.run',
'--oidc-audience=262096764402729854@holos_platform',
'--metrics-port={}'.format(metrics_port),
],
dockerfile='./hack/tilt/Dockerfile',
only=['./bin'],
# (Recommended) Updating a running container in-place
# https://docs.tilt.dev/live_update_reference.html
live_update=[
# Sync files from host to container
sync('./bin', '/app/bin'),
# Wait for aws-login https://github.com/tilt-dev/tilt/issues/3048
sync('./tilt/aws-login.last', '/dev/null'),
# Execute commands in the container when paths change
# run('/app/hack/codegen.sh', trigger=['./app/api'])
],
)
# Run local commands
# Local commands can be helpful for one-time tasks like installing
# project prerequisites. They can also manage long-lived processes
# for non-containerized services or dependencies.
#
# More info: https://docs.tilt.dev/local_resource.html
#
# local_resource('install-helm',
# cmd='which helm > /dev/null || brew install helm',
# # `cmd_bat`, when present, is used instead of `cmd` on Windows.
# cmd_bat=[
# 'powershell.exe',
# '-Noninteractive',
# '-Command',
# '& {if (!(Get-Command helm -ErrorAction SilentlyContinue)) {scoop install helm}}'
# ]
# )
# Teach tilt about our custom resources (Note, this may be intended for workloads)
# k8s_kind('authorizationpolicy')
# k8s_kind('requestauthentication')
# k8s_kind('virtualservice')
k8s_kind('pgadmin')
# Troubleshooting
def resource_name(id):
print('resource: {}'.format(id))
return id.name
workload_to_resource_function(resource_name)
# Apply Kubernetes manifests
# Tilt will build & push any necessary images, re-deploying your
# resources as they change.
#
# More info: https://docs.tilt.dev/api.html#api.k8s_yaml
#
def holos_yaml():
"""Return a k8s Deployment personalized for the developer."""
k8s_yaml_template = str(read_file('./hack/tilt/k8s.yaml'))
return k8s_yaml_template.format(
name=holos_server,
developer=developer,
namespace=namespace,
listen_port=listen_port,
metrics_port=metrics_port,
tz=os.getenv('TZ'),
)
# Customize a Kubernetes resource
# By default, Kubernetes resource names are automatically assigned
# based on objects in the YAML manifests, e.g. Deployment name.
#
# Tilt strives for sane defaults, so calling k8s_resource is
# optional, and you only need to pass the arguments you want to
# override.
#
# More info: https://docs.tilt.dev/api.html#api.k8s_resource
#
k8s_yaml(blob(holos_yaml()))
# Backend server process
k8s_resource(
workload=holos_server,
new_name=holos_backend,
objects=[
'{}:serviceaccount'.format(holos_server),
'{}:servicemonitor'.format(holos_server),
],
resource_deps=[compile_id],
links=[
link('https://{}.holos.dev.k2.ois.run/app/'.format(developer), "Holos Web UI")
],
)
# AuthorizationPolicy - Beyond Corp functionality
k8s_resource(
new_name=auth_id,
objects=[
'{}:virtualservice'.format(holos_server),
'{}-allow-groups:authorizationpolicy'.format(holos_server),
'{}-allow-nothing:authorizationpolicy'.format(holos_server),
'{}-allow-well-known-paths:authorizationpolicy'.format(holos_server),
'{}-auth:authorizationpolicy'.format(holos_server),
'{}:requestauthentication'.format(holos_server),
],
)
# Database
# Note: Tilt confuses the backup pods with the database server pods, so this code is careful to tease the pods
# apart so logs are streamed correctly.
# See: https://github.com/tilt-dev/tilt.specs/blob/master/resource_assembly.md
# pgAdmin Web UI
k8s_resource(
workload=pg_admin_name,
new_name=pg_admin,
port_forwards=[
port_forward(15050, 5050, pg_admin),
],
)
# Disabled because these don't group resources nicely
# k8s_kind('postgrescluster')
# Postgres database in-cluster
k8s_resource(
new_name=pg_cluster,
objects=['holos:postgrescluster'],
)
# Needed to select the database by label
# https://docs.tilt.dev/api.html#api.k8s_custom_deploy
k8s_custom_deploy(
pg_svc,
apply_cmd=['./hack/tilt/k8s-get-db-sts', pg_cluster_name],
delete_cmd=['echo', 'Skipping delete. Object managed by custom resource.'],
deps=[],
)
k8s_resource(
pg_svc,
port_forwards=[
port_forward(15432, 5432, 'psql'),
],
resource_deps=[pg_cluster]
)
# Run tests
local_resource(
tests_id,
'make test',
allow_parallel=True,
auto_init=False,
deps=developer_paths,
)
# Run linter
local_resource(
lint_id,
'make lint',
allow_parallel=True,
auto_init=False,
deps=developer_paths,
)
# UI Buttons for helpful things.
# Icons: https://fonts.google.com/icons
os.putenv("GH_FORCE_TTY", "80%")
cmd_button(
'{}:go-test-failfast'.format(tests_id),
argv=['./hack/tilt/go-test-failfast'],
resource=tests_id,
icon_name='quiz',
text='Fail Fast',
)
cmd_button(
'{}:issues'.format(holos_server),
argv=['./hack/tilt/gh-issues'],
resource=holos_backend,
icon_name='folder_data',
text='Issues',
)
cmd_button(
'{}:gh-issue-view'.format(holos_server),
argv=['./hack/tilt/gh-issue-view'],
resource=holos_backend,
icon_name='task',
text='View Issue',
)
cmd_button(
'{}:get-pgdb-creds'.format(holos_server),
argv=['./hack/tilt/get-pgdb-creds', pg_cluster_name, pg_database_name],
resource=pg_svc,
icon_name='lock_open_right',
text='DB Creds',
)
cmd_button(
'{}:get-pgdb-creds'.format(pg_admin_name),
argv=['./hack/tilt/get-pgdb-creds', pg_cluster_name, pg_database_name],
resource=pg_admin,
icon_name='lock_open_right',
text='DB Creds',
)
cmd_button(
'{}:get-pgadmin-creds'.format(pg_admin_name),
argv=['./hack/tilt/get-pgadmin-creds', pg_admin_name],
resource=pg_admin,
icon_name='lock_open_right',
text='pgAdmin Login',
)
print("✨ Tiltfile evaluated")

View File

@@ -19,9 +19,10 @@ type BuildPlanSpec struct {
}
type BuildPlanComponents struct {
HelmChartList []HelmChart `json:"helmChartList,omitempty" yaml:"helmChartList,omitempty"`
KubernetesObjectsList []KubernetesObjects `json:"kubernetesObjectsList,omitempty" yaml:"kubernetesObjectsList,omitempty"`
KustomizeBuildList []KustomizeBuild `json:"kustomizeBuildList,omitempty" yaml:"kustomizeBuildList,omitempty"`
HelmChartList []HelmChart `json:"helmChartList,omitempty" yaml:"helmChartList,omitempty"`
KubernetesObjectsList []KubernetesObjects `json:"kubernetesObjectsList,omitempty" yaml:"kubernetesObjectsList,omitempty"`
KustomizeBuildList []KustomizeBuild `json:"kustomizeBuildList,omitempty" yaml:"kustomizeBuildList,omitempty"`
Resources map[string]KubernetesObjects `json:"resources,omitempty" yaml:"resources,omitempty"`
}
func (bp *BuildPlan) Validate() error {

View File

@@ -8,9 +8,9 @@ import (
"strings"
"github.com/holos-run/holos"
"github.com/holos-run/holos/pkg/logger"
"github.com/holos-run/holos/pkg/util"
"github.com/holos-run/holos/pkg/wrapper"
"github.com/holos-run/holos/internal/errors"
"github.com/holos-run/holos/internal/logger"
"github.com/holos-run/holos/internal/util"
)
// A HelmChart represents a helm command to provide chart values in order to render kubernetes api objects.
@@ -35,21 +35,21 @@ type Repository struct {
URL string `json:"url"`
}
func (hc *HelmChart) Render(ctx context.Context, path holos.PathComponent) (*Result, error) {
func (hc *HelmChart) Render(ctx context.Context, path holos.InstancePath) (*Result, error) {
result := Result{HolosComponent: hc.HolosComponent}
if err := hc.helm(ctx, &result, path); err != nil {
return nil, err
}
result.addObjectMap(ctx, hc.APIObjectMap)
if err := result.kustomize(ctx); err != nil {
return nil, wrapper.Wrap(fmt.Errorf("could not kustomize: %w", err))
return nil, errors.Wrap(fmt.Errorf("could not kustomize: %w", err))
}
return &result, nil
}
// runHelm provides the values produced by CUE to helm template and returns
// the rendered kubernetes api objects in the result.
func (hc *HelmChart) helm(ctx context.Context, r *Result, path holos.PathComponent) error {
func (hc *HelmChart) helm(ctx context.Context, r *Result, path holos.InstancePath) error {
log := logger.FromContext(ctx).With("chart", hc.Chart.Name)
if hc.Chart.Name == "" {
log.WarnContext(ctx, "skipping helm: no chart name specified, use a different component type")
@@ -64,13 +64,13 @@ func (hc *HelmChart) helm(ctx context.Context, r *Result, path holos.PathCompone
out, err := util.RunCmd(ctx, "helm", "repo", "add", repo.Name, repo.URL)
if err != nil {
log.ErrorContext(ctx, "could not run helm", "stderr", out.Stderr.String(), "stdout", out.Stdout.String())
return wrapper.Wrap(fmt.Errorf("could not run helm repo add: %w", err))
return errors.Wrap(fmt.Errorf("could not run helm repo add: %w", err))
}
// Update repository
out, err = util.RunCmd(ctx, "helm", "repo", "update", repo.Name)
if err != nil {
log.ErrorContext(ctx, "could not run helm", "stderr", out.Stderr.String(), "stdout", out.Stdout.String())
return wrapper.Wrap(fmt.Errorf("could not run helm repo update: %w", err))
return errors.Wrap(fmt.Errorf("could not run helm repo update: %w", err))
}
} else {
log.DebugContext(ctx, "no chart repository url proceeding assuming oci chart")
@@ -85,13 +85,13 @@ func (hc *HelmChart) helm(ctx context.Context, r *Result, path holos.PathCompone
// Write values file
tempDir, err := os.MkdirTemp("", "holos")
if err != nil {
return wrapper.Wrap(fmt.Errorf("could not make temp dir: %w", err))
return errors.Wrap(fmt.Errorf("could not make temp dir: %w", err))
}
defer util.Remove(ctx, tempDir)
valuesPath := filepath.Join(tempDir, "values.yaml")
if err := os.WriteFile(valuesPath, []byte(hc.ValuesContent), 0644); err != nil {
return wrapper.Wrap(fmt.Errorf("could not write values: %w", err))
return errors.Wrap(fmt.Errorf("could not write values: %w", err))
}
log.DebugContext(ctx, "helm: wrote values", "path", valuesPath, "bytes", len(hc.ValuesContent))
@@ -112,7 +112,7 @@ func (hc *HelmChart) helm(ctx context.Context, r *Result, path holos.PathCompone
err = fmt.Errorf("%s: %w", line, err)
}
}
return wrapper.Wrap(fmt.Errorf("could not run helm template: %w", err))
return errors.Wrap(fmt.Errorf("could not run helm template: %w", err))
}
r.accumulatedOutput = helmOut.Stdout.String()
@@ -121,12 +121,12 @@ func (hc *HelmChart) helm(ctx context.Context, r *Result, path holos.PathCompone
}
// cacheChart stores a cached copy of Chart in the chart subdirectory of path.
func cacheChart(ctx context.Context, path holos.PathComponent, chartDir string, chart Chart) error {
func cacheChart(ctx context.Context, path holos.InstancePath, chartDir string, chart Chart) error {
log := logger.FromContext(ctx)
cacheTemp, err := os.MkdirTemp(string(path), chartDir)
if err != nil {
return wrapper.Wrap(fmt.Errorf("could not make temp dir: %w", err))
return errors.Wrap(fmt.Errorf("could not make temp dir: %w", err))
}
defer util.Remove(ctx, cacheTemp)
@@ -136,14 +136,30 @@ func cacheChart(ctx context.Context, path holos.PathComponent, chartDir string,
}
helmOut, err := util.RunCmd(ctx, "helm", "pull", "--destination", cacheTemp, "--untar=true", "--version", chart.Version, chartName)
if err != nil {
return wrapper.Wrap(fmt.Errorf("could not run helm pull: %w", err))
return errors.Wrap(fmt.Errorf("could not run helm pull: %w", err))
}
log.Debug("helm pull", "stdout", helmOut.Stdout, "stderr", helmOut.Stderr)
cachePath := filepath.Join(string(path), chartDir)
if err := os.Rename(cacheTemp, cachePath); err != nil {
return wrapper.Wrap(fmt.Errorf("could not rename: %w", err))
if err := os.MkdirAll(cachePath, 0777); err != nil {
return errors.Wrap(fmt.Errorf("could not mkdir: %w", err))
}
items, err := os.ReadDir(cacheTemp)
if err != nil {
return errors.Wrap(fmt.Errorf("could not read directory: %w", err))
}
for _, item := range items {
src := filepath.Join(cacheTemp, item.Name())
dst := filepath.Join(cachePath, item.Name())
log.DebugContext(ctx, "rename", "src", src, "dst", dst)
if err := os.Rename(src, dst); err != nil {
return errors.Wrap(fmt.Errorf("could not rename: %w", err))
}
}
log.InfoContext(ctx, "cached", "chart", chart.Name, "version", chart.Version, "path", cachePath)
return nil

View File

@@ -2,6 +2,7 @@ package v1alpha1
import (
"context"
"github.com/holos-run/holos"
)
@@ -13,7 +14,7 @@ type KubernetesObjects struct {
}
// Render produces kubernetes api objects from the APIObjectMap
func (o *KubernetesObjects) Render(ctx context.Context, path holos.PathComponent) (*Result, error) {
func (o *KubernetesObjects) Render(ctx context.Context, path holos.InstancePath) (*Result, error) {
result := Result{HolosComponent: o.HolosComponent}
result.addObjectMap(ctx, o.APIObjectMap)
return &result, nil

View File

@@ -2,10 +2,11 @@ package v1alpha1
import (
"context"
"github.com/holos-run/holos"
"github.com/holos-run/holos/pkg/logger"
"github.com/holos-run/holos/pkg/util"
"github.com/holos-run/holos/pkg/wrapper"
"github.com/holos-run/holos/internal/errors"
"github.com/holos-run/holos/internal/logger"
"github.com/holos-run/holos/internal/util"
)
const KustomizeBuildKind = "KustomizeBuild"
@@ -29,14 +30,14 @@ type KustomizeBuild struct {
// Render produces a Result by executing kubectl kustomize on the holos
// component path. Useful for processing raw yaml files.
func (kb *KustomizeBuild) Render(ctx context.Context, path holos.PathComponent) (*Result, error) {
func (kb *KustomizeBuild) Render(ctx context.Context, path holos.InstancePath) (*Result, error) {
log := logger.FromContext(ctx)
result := Result{HolosComponent: kb.HolosComponent}
// Run kustomize.
kOut, err := util.RunCmd(ctx, "kubectl", "kustomize", string(path))
if err != nil {
log.ErrorContext(ctx, kOut.Stderr.String())
return nil, wrapper.Wrap(err)
return nil, errors.Wrap(err)
}
// Replace the accumulated output
result.accumulatedOutput = kOut.Stdout.String()

View File

@@ -1,8 +1,14 @@
package v1alpha1
// Label is an arbitrary unique identifier. Defined as a type for clarity and type checking.
type Label string
// Kind is a kubernetes api object kind. Defined as a type for clarity and type checking.
type Kind string
// APIObjectMap is the shape of marshalled api objects returned from cue to the
// holos cli. A map is used to improve the clarity of error messages from cue.
type APIObjectMap map[string]map[string]string
type APIObjectMap map[Kind]map[Label]string
// FileContentMap is a map of file names to file contents.
type FileContentMap map[string]string

View File

@@ -2,12 +2,13 @@ package v1alpha1
import (
"context"
"github.com/holos-run/holos"
)
type Renderer interface {
GetKind() string
Render(ctx context.Context, path holos.PathComponent) (*Result, error)
Render(ctx context.Context, path holos.InstancePath) (*Result, error)
}
// Render produces a Result representing the kubernetes api objects to
@@ -16,6 +17,6 @@ type Renderer interface {
// conceptualized as a data pipeline, for example a component may render a
// result by first calling helm template, then passing the result through
// kustomize, then mixing in overlay api objects.
func Render(ctx context.Context, r Renderer, path holos.PathComponent) (*Result, error) {
func Render(ctx context.Context, r Renderer, path holos.InstancePath) (*Result, error) {
return r.Render(ctx, path)
}

View File

@@ -3,12 +3,13 @@ package v1alpha1
import (
"context"
"fmt"
"github.com/holos-run/holos/pkg/logger"
"github.com/holos-run/holos/pkg/util"
"github.com/holos-run/holos/pkg/wrapper"
"os"
"path/filepath"
"slices"
"github.com/holos-run/holos/internal/errors"
"github.com/holos-run/holos/internal/logger"
"github.com/holos-run/holos/internal/util"
)
// Result is the build result for display or writing. Holos components Render the Result as a data pipeline.
@@ -40,7 +41,7 @@ func (r *Result) AccumulatedOutput() string {
func (r *Result) addObjectMap(ctx context.Context, objectMap APIObjectMap) {
log := logger.FromContext(ctx)
b := []byte(r.AccumulatedOutput())
kinds := make([]string, 0, len(objectMap))
kinds := make([]Kind, 0, len(objectMap))
// Sort the keys
for kind := range objectMap {
kinds = append(kinds, kind)
@@ -50,7 +51,7 @@ func (r *Result) addObjectMap(ctx context.Context, objectMap APIObjectMap) {
for _, kind := range kinds {
v := objectMap[kind]
// Sort the keys
names := make([]string, 0, len(v))
names := make([]Label, 0, len(v))
for name := range v {
names = append(names, name)
}
@@ -81,7 +82,7 @@ func (r *Result) kustomize(ctx context.Context) error {
}
tempDir, err := os.MkdirTemp("", "holos.kustomize")
if err != nil {
return wrapper.Wrap(err)
return errors.Wrap(err)
}
defer util.Remove(ctx, tempDir)
@@ -90,7 +91,7 @@ func (r *Result) kustomize(ctx context.Context) error {
b := []byte(r.AccumulatedOutput())
b = util.EnsureNewline(b)
if err := os.WriteFile(target, b, 0644); err != nil {
return wrapper.Wrap(fmt.Errorf("could not write resources: %w", err))
return errors.Wrap(fmt.Errorf("could not write resources: %w", err))
}
log.DebugContext(ctx, "wrote: "+target, "op", "write", "path", target, "bytes", len(b))
@@ -98,12 +99,12 @@ func (r *Result) kustomize(ctx context.Context) error {
for file, content := range r.KustomizeFiles {
target := filepath.Join(tempDir, file)
if err := os.MkdirAll(filepath.Dir(target), 0755); err != nil {
return wrapper.Wrap(err)
return errors.Wrap(err)
}
b := []byte(content)
b = util.EnsureNewline(b)
if err := os.WriteFile(target, b, 0644); err != nil {
return wrapper.Wrap(fmt.Errorf("could not write: %w", err))
return errors.Wrap(fmt.Errorf("could not write: %w", err))
}
log.DebugContext(ctx, "wrote: "+target, "op", "write", "path", target, "bytes", len(b))
}
@@ -112,7 +113,7 @@ func (r *Result) kustomize(ctx context.Context) error {
kOut, err := util.RunCmd(ctx, "kubectl", "kustomize", tempDir)
if err != nil {
log.ErrorContext(ctx, kOut.Stderr.String())
return wrapper.Wrap(err)
return errors.Wrap(err)
}
// Replace the accumulated output
r.accumulatedOutput = kOut.Stdout.String()
@@ -125,12 +126,12 @@ func (r *Result) Save(ctx context.Context, path string, content string) error {
dir := filepath.Dir(path)
if err := os.MkdirAll(dir, os.FileMode(0775)); err != nil {
log.WarnContext(ctx, "could not mkdir", "path", dir, "err", err)
return wrapper.Wrap(err)
return errors.Wrap(err)
}
// Write the kube api objects
if err := os.WriteFile(path, []byte(content), os.FileMode(0644)); err != nil {
log.WarnContext(ctx, "could not write", "path", path, "err", err)
return wrapper.Wrap(err)
return errors.Wrap(err)
}
log.DebugContext(ctx, "out: wrote "+path, "action", "write", "path", path, "status", "ok")
return nil

24
buf.gen.yaml Normal file
View File

@@ -0,0 +1,24 @@
# Generates gRPC and ConnectRPC bindings for Go and TypeScript
#
# Note: protoc-gen-connect-query is the primary method of wiring up the React
# frontend.
version: v1
plugins:
- plugin: go
out: service/gen
opt: paths=source_relative
- plugin: connect-go
out: service/gen
opt: paths=source_relative
- plugin: es
out: internal/frontend/holos/gen
opt:
- target=ts
- plugin: connect-es
out: internal/frontend/holos/gen
opt:
- target=ts
- plugin: connect-query
out: internal/frontend/holos/gen
opt:
- target=ts

8
buf.lock Normal file
View File

@@ -0,0 +1,8 @@
# Generated by buf. DO NOT EDIT.
version: v1
deps:
- remote: buf.build
owner: bufbuild
repository: protovalidate
commit: b983156c5e994cc9892e0ce3e64e17e0
digest: shake256:fb47a62989d38c2529bcc5cd86ded43d800eb84cee82b42b9e8a9e815d4ee8134a0fb9d0ce8299b27c2d2bbb7d6ade0c4ad5a8a4d467e1e2c7ca619ae9f634e2

3
buf.work.yaml Normal file
View File

@@ -0,0 +1,3 @@
version: v1
directories:
- service

View File

@@ -1,8 +1,9 @@
package main
import (
"github.com/holos-run/holos/pkg/cli"
"os"
"github.com/holos-run/holos/internal/cli"
)
func main() {

View File

@@ -1,10 +1,11 @@
package main
import (
"github.com/holos-run/holos/pkg/cli"
"github.com/rogpeppe/go-internal/testscript"
"os"
"testing"
"github.com/holos-run/holos/internal/cli"
"github.com/rogpeppe/go-internal/testscript"
)
func TestMain(m *testing.M) {

View File

@@ -0,0 +1,37 @@
package holos
import ap "security.istio.io/authorizationpolicy/v1"
// #AuthPolicyRules represents AuthorizationPolicy rules for hosts that need specialized treatment. Entries in this struct are exclused from the blank ingressauth AuthorizationPolicy governing the ingressgateway and included in a spcialized policy
#AuthPolicyRules: {
// AuthProxySpec represents the identity provider configuration
AuthProxySpec: #AuthProxySpec & #Platform.authproxy
// Hosts are hosts that need specialized treatment
hosts: {
[Name=_]: {
// name is the fully qualifed hostname, a Host: header value.
name: Name
// slug is the resource name prefix
slug: string
// Refer to https://istio.io/latest/docs/reference/config/security/authorization-policy/#Rule
spec: ap.#AuthorizationPolicySpec & {
action: "CUSTOM"
provider: name: AuthProxySpec.provider
selector: matchLabels: istio: "ingressgateway"
}
}
}
objects: #APIObjects & {
for Host in hosts {
apiObjects: {
AuthorizationPolicy: "\(Host.slug)-custom": {
metadata: namespace: "istio-ingress"
metadata: name: "\(Host.slug)-custom"
spec: Host.spec
}
}
}
}
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,189 @@
// Code generated by timoni. DO NOT EDIT.
//timoni:generate timoni vendor crd -f /home/jeff/workspace/holos-run/holos-infra/deploy/clusters/k2/components/prod-platform-argocd/prod-platform-argocd.gen.yaml
package v1alpha1
import "strings"
// AppProject provides a logical grouping of applications,
// providing controls for: * where the apps may deploy to
// (cluster whitelist) * what may be deployed (repository
// whitelist, resource whitelist/blacklist) * who can access
// these applications (roles, OIDC group claims bindings) * and
// what they can do (RBAC policies) * automation access to these
// roles (JWT tokens)
#AppProject: {
// APIVersion defines the versioned schema of this representation
// of an object. Servers should convert recognized schemas to the
// latest internal value, and may reject unrecognized values.
// More info:
// https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
apiVersion: "argoproj.io/v1alpha1"
// Kind is a string value representing the REST resource this
// object represents. Servers may infer this from the endpoint
// the client submits requests to. Cannot be updated. In
// CamelCase. More info:
// https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
kind: "AppProject"
metadata: {
name!: strings.MaxRunes(253) & strings.MinRunes(1) & {
string
}
namespace!: strings.MaxRunes(63) & strings.MinRunes(1) & {
string
}
labels?: {
[string]: string
}
annotations?: {
[string]: string
}
}
// AppProjectSpec is the specification of an AppProject
spec!: #AppProjectSpec
}
// AppProjectSpec is the specification of an AppProject
#AppProjectSpec: {
// ClusterResourceBlacklist contains list of blacklisted cluster
// level resources
clusterResourceBlacklist?: [...{
group: string
kind: string
}]
// ClusterResourceWhitelist contains list of whitelisted cluster
// level resources
clusterResourceWhitelist?: [...{
group: string
kind: string
}]
// Description contains optional project description
description?: string
// Destinations contains list of destinations available for
// deployment
destinations?: [...{
// Name is an alternate way of specifying the target cluster by
// its symbolic name. This must be set if Server is not set.
name?: string
// Namespace specifies the target namespace for the application's
// resources. The namespace will only be set for namespace-scoped
// resources that have not set a value for .metadata.namespace
namespace?: string
// Server specifies the URL of the target cluster's Kubernetes
// control plane API. This must be set if Name is not set.
server?: string
}]
// NamespaceResourceBlacklist contains list of blacklisted
// namespace level resources
namespaceResourceBlacklist?: [...{
group: string
kind: string
}]
// NamespaceResourceWhitelist contains list of whitelisted
// namespace level resources
namespaceResourceWhitelist?: [...{
group: string
kind: string
}]
// OrphanedResources specifies if controller should monitor
// orphaned resources of apps in this project
orphanedResources?: {
// Ignore contains a list of resources that are to be excluded
// from orphaned resources monitoring
ignore?: [...{
group?: string
kind?: string
name?: string
}]
// Warn indicates if warning condition should be created for apps
// which have orphaned resources
warn?: bool
}
// PermitOnlyProjectScopedClusters determines whether destinations
// can only reference clusters which are project-scoped
permitOnlyProjectScopedClusters?: bool
// Roles are user defined RBAC roles associated with this project
roles?: [...{
// Description is a description of the role
description?: string
// Groups are a list of OIDC group claims bound to this role
groups?: [...string]
// JWTTokens are a list of generated JWT tokens bound to this role
jwtTokens?: [...{
exp?: int
iat: int
id?: string
}]
// Name is a name for this role
name: string
// Policies Stores a list of casbin formatted strings that define
// access policies for the role in the project
policies?: [...string]
}]
// SignatureKeys contains a list of PGP key IDs that commits in
// Git must be signed with in order to be allowed for sync
signatureKeys?: [...{
// The ID of the key in hexadecimal notation
keyID: string
}]
// SourceNamespaces defines the namespaces application resources
// are allowed to be created in
sourceNamespaces?: [...string]
// SourceRepos contains list of repository URLs which can be used
// for deployment
sourceRepos?: [...string]
// SyncWindows controls when syncs can be run for apps in this
// project
syncWindows?: [...{
// Applications contains a list of applications that the window
// will apply to
applications?: [...string]
// Clusters contains a list of clusters that the window will apply
// to
clusters?: [...string]
// Duration is the amount of time the sync window will be open
duration?: string
// Kind defines if the window allows or blocks syncs
kind?: string
// ManualSync enables manual syncs when they would otherwise be
// blocked
manualSync?: bool
// Namespaces contains a list of namespaces that the window will
// apply to
namespaces?: [...string]
// Schedule is the time the window will begin, specified in cron
// format
schedule?: string
// TimeZone of the sync that will be applied to the schedule
timeZone?: string
}]
}

View File

@@ -19,7 +19,8 @@ package v1alpha1
}
#BuildPlanComponents: {
helmCharts?: [...#HelmChart] @go(HelmCharts,[]HelmChart)
kubernetesObjects?: [...#KubernetesObjects] @go(KubernetesObjects,[]KubernetesObjects)
kustomizeBuilds?: [...#KustomizeBuild] @go(KustomizeBuilds,[]KustomizeBuild)
helmChartList?: [...#HelmChart] @go(HelmChartList,[]HelmChart)
kubernetesObjectsList?: [...#KubernetesObjects] @go(KubernetesObjectsList,[]KubernetesObjects)
kustomizeBuildList?: [...#KustomizeBuild] @go(KustomizeBuildList,[]KustomizeBuild)
resources?: {[string]: #KubernetesObjects} @go(Resources,map[string]KubernetesObjects)
}

View File

@@ -11,5 +11,5 @@ package v1alpha1
// ChartDir is the directory name created in the holos component directory to cache a chart.
#ChartDir: "vendor"
// ResourcesFile is the file name used to store component output when post processing with kustomize.
// ResourcesFile is the file name used to store component output when post-processing with kustomize.
#ResourcesFile: "resources.yaml"

View File

@@ -4,6 +4,12 @@
package v1alpha1
// Label is an arbitrary unique identifier. Defined as a type for clarity and type checking.
#Label: string
// Kind is a kubernetes api object kind. Defined as a type for clarity and type checking.
#Kind: string
// APIObjectMap is the shape of marshalled api objects returned from cue to the
// holos cli. A map is used to improve the clarity of error messages from cue.
#APIObjectMap: {[string]: [string]: string}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,546 @@
// Code generated by timoni. DO NOT EDIT.
//timoni:generate timoni vendor crd -f /home/jeff/workspace/holos-run/holos-infra/deploy/clusters/k2/components/prod-platform-monitoring/prod-platform-monitoring.gen.yaml
package v1
import "strings"
// PodMonitor defines monitoring for a set of pods.
#PodMonitor: {
// APIVersion defines the versioned schema of this representation
// of an object. Servers should convert recognized schemas to the
// latest internal value, and may reject unrecognized values.
// More info:
// https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
apiVersion: "monitoring.coreos.com/v1"
// Kind is a string value representing the REST resource this
// object represents. Servers may infer this from the endpoint
// the client submits requests to. Cannot be updated. In
// CamelCase. More info:
// https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
kind: "PodMonitor"
metadata!: {
name!: strings.MaxRunes(253) & strings.MinRunes(1) & {
string
}
namespace!: strings.MaxRunes(63) & strings.MinRunes(1) & {
string
}
labels?: {
[string]: string
}
annotations?: {
[string]: string
}
}
// Specification of desired Pod selection for target discovery by
// Prometheus.
spec!: #PodMonitorSpec
}
// Specification of desired Pod selection for target discovery by
// Prometheus.
#PodMonitorSpec: {
attachMetadata?: {
// When set to true, Prometheus must have the `get` permission on
// the `Nodes` objects.
node?: bool
}
// The label to use to retrieve the job name from. `jobLabel`
// selects the label from the associated Kubernetes `Pod` object
// which will be used as the `job` label for all metrics.
// For example if `jobLabel` is set to `foo` and the Kubernetes
// `Pod` object is labeled with `foo: bar`, then Prometheus adds
// the `job="bar"` label to all ingested metrics.
// If the value of this field is empty, the `job` label of the
// metrics defaults to the namespace and name of the PodMonitor
// object (e.g. `<namespace>/<name>`).
jobLabel?: string
// Per-scrape limit on the number of targets dropped by relabeling
// that will be kept in memory. 0 means no limit.
// It requires Prometheus >= v2.47.0.
keepDroppedTargets?: int
// Per-scrape limit on number of labels that will be accepted for
// a sample.
// It requires Prometheus >= v2.27.0.
labelLimit?: int
// Per-scrape limit on length of labels name that will be accepted
// for a sample.
// It requires Prometheus >= v2.27.0.
labelNameLengthLimit?: int
// Per-scrape limit on length of labels value that will be
// accepted for a sample.
// It requires Prometheus >= v2.27.0.
labelValueLengthLimit?: int
// Selector to select which namespaces the Kubernetes `Pods`
// objects are discovered from.
namespaceSelector?: {
// Boolean describing whether all namespaces are selected in
// contrast to a list restricting them.
any?: bool
// List of namespace names to select from.
matchNames?: [...string]
}
// List of endpoints part of this PodMonitor.
podMetricsEndpoints?: [...{
// `authorization` configures the Authorization header credentials
// to use when scraping the target.
// Cannot be set at the same time as `basicAuth`, or `oauth2`.
authorization?: {
// Selects a key of a Secret in the namespace that contains the
// credentials for authentication.
credentials?: {
// The key of the secret to select from. Must be a valid secret
// key.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the Secret or its key must be defined
optional?: bool
}
// Defines the authentication type. The value is case-insensitive.
// "Basic" is not a supported value.
// Default: "Bearer"
type?: string
}
// `basicAuth` configures the Basic Authentication credentials to
// use when scraping the target.
// Cannot be set at the same time as `authorization`, or `oauth2`.
basicAuth?: {
// `password` specifies a key of a Secret containing the password
// for authentication.
password?: {
// The key of the secret to select from. Must be a valid secret
// key.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the Secret or its key must be defined
optional?: bool
}
// `username` specifies a key of a Secret containing the username
// for authentication.
username?: {
// The key of the secret to select from. Must be a valid secret
// key.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the Secret or its key must be defined
optional?: bool
}
}
// `bearerTokenSecret` specifies a key of a Secret containing the
// bearer token for scraping targets. The secret needs to be in
// the same namespace as the PodMonitor object and readable by
// the Prometheus Operator.
// Deprecated: use `authorization` instead.
bearerTokenSecret?: {
// The key of the secret to select from. Must be a valid secret
// key.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the Secret or its key must be defined
optional?: bool
}
// `enableHttp2` can be used to disable HTTP2 when scraping the
// target.
enableHttp2?: bool
// When true, the pods which are not running (e.g. either in
// Failed or Succeeded state) are dropped during the target
// discovery.
// If unset, the filtering is enabled.
// More info:
// https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase
filterRunning?: bool
// `followRedirects` defines whether the scrape requests should
// follow HTTP 3xx redirects.
followRedirects?: bool
// When true, `honorLabels` preserves the metric's labels when
// they collide with the target's labels.
honorLabels?: bool
// `honorTimestamps` controls whether Prometheus preserves the
// timestamps when exposed by the target.
honorTimestamps?: bool
// Interval at which Prometheus scrapes the metrics from the
// target.
// If empty, Prometheus uses the global scrape interval.
interval?: =~"^(0|(([0-9]+)y)?(([0-9]+)w)?(([0-9]+)d)?(([0-9]+)h)?(([0-9]+)m)?(([0-9]+)s)?(([0-9]+)ms)?)$"
// `metricRelabelings` configures the relabeling rules to apply to
// the samples before ingestion.
metricRelabelings?: [...{
// Action to perform based on the regex matching.
// `Uppercase` and `Lowercase` actions require Prometheus >=
// v2.36.0. `DropEqual` and `KeepEqual` actions require
// Prometheus >= v2.41.0.
// Default: "Replace"
action?: "replace" | "Replace" | "keep" | "Keep" | "drop" | "Drop" | "hashmod" | "HashMod" | "labelmap" | "LabelMap" | "labeldrop" | "LabelDrop" | "labelkeep" | "LabelKeep" | "lowercase" | "Lowercase" | "uppercase" | "Uppercase" | "keepequal" | "KeepEqual" | "dropequal" | "DropEqual" | *"replace"
// Modulus to take of the hash of the source label values.
// Only applicable when the action is `HashMod`.
modulus?: int
// Regular expression against which the extracted value is
// matched.
regex?: string
// Replacement value against which a Replace action is performed
// if the regular expression matches.
// Regex capture groups are available.
replacement?: string
// Separator is the string between concatenated SourceLabels.
separator?: string
// The source labels select values from existing labels. Their
// content is concatenated using the configured Separator and
// matched against the configured regular expression.
sourceLabels?: [...=~"^[a-zA-Z_][a-zA-Z0-9_]*$"]
// Label to which the resulting string is written in a
// replacement.
// It is mandatory for `Replace`, `HashMod`, `Lowercase`,
// `Uppercase`, `KeepEqual` and `DropEqual` actions.
// Regex capture groups are available.
targetLabel?: string
}]
// `oauth2` configures the OAuth2 settings to use when scraping
// the target.
// It requires Prometheus >= 2.27.0.
// Cannot be set at the same time as `authorization`, or
// `basicAuth`.
oauth2?: {
// `clientId` specifies a key of a Secret or ConfigMap containing
// the OAuth2 client's ID.
clientId: {
// ConfigMap containing data to use for the targets.
configMap?: {
// The key to select.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the ConfigMap or its key must be defined
optional?: bool
}
// Secret containing data to use for the targets.
secret?: {
// The key of the secret to select from. Must be a valid secret
// key.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the Secret or its key must be defined
optional?: bool
}
}
// `clientSecret` specifies a key of a Secret containing the
// OAuth2 client's secret.
clientSecret: {
// The key of the secret to select from. Must be a valid secret
// key.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the Secret or its key must be defined
optional?: bool
}
// `endpointParams` configures the HTTP parameters to append to
// the token URL.
endpointParams?: {
[string]: string
}
// `scopes` defines the OAuth2 scopes used for the token request.
scopes?: [...string]
// `tokenURL` configures the URL to fetch the token from.
tokenUrl: strings.MinRunes(1)
}
// `params` define optional HTTP URL parameters.
params?: {
[string]: [...string]
}
// HTTP path from which to scrape for metrics.
// If empty, Prometheus uses the default value (e.g. `/metrics`).
path?: string
// Name of the Pod port which this endpoint refers to.
// It takes precedence over `targetPort`.
port?: string
// `proxyURL` configures the HTTP Proxy URL (e.g.
// "http://proxyserver:2195") to go through when scraping the
// target.
proxyUrl?: string
// `relabelings` configures the relabeling rules to apply the
// target's metadata labels.
// The Operator automatically adds relabelings for a few standard
// Kubernetes fields.
// The original scrape job's name is available via the
// `__tmp_prometheus_job_name` label.
// More info:
// https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config
relabelings?: [...{
// Action to perform based on the regex matching.
// `Uppercase` and `Lowercase` actions require Prometheus >=
// v2.36.0. `DropEqual` and `KeepEqual` actions require
// Prometheus >= v2.41.0.
// Default: "Replace"
action?: "replace" | "Replace" | "keep" | "Keep" | "drop" | "Drop" | "hashmod" | "HashMod" | "labelmap" | "LabelMap" | "labeldrop" | "LabelDrop" | "labelkeep" | "LabelKeep" | "lowercase" | "Lowercase" | "uppercase" | "Uppercase" | "keepequal" | "KeepEqual" | "dropequal" | "DropEqual" | *"replace"
// Modulus to take of the hash of the source label values.
// Only applicable when the action is `HashMod`.
modulus?: int
// Regular expression against which the extracted value is
// matched.
regex?: string
// Replacement value against which a Replace action is performed
// if the regular expression matches.
// Regex capture groups are available.
replacement?: string
// Separator is the string between concatenated SourceLabels.
separator?: string
// The source labels select values from existing labels. Their
// content is concatenated using the configured Separator and
// matched against the configured regular expression.
sourceLabels?: [...=~"^[a-zA-Z_][a-zA-Z0-9_]*$"]
// Label to which the resulting string is written in a
// replacement.
// It is mandatory for `Replace`, `HashMod`, `Lowercase`,
// `Uppercase`, `KeepEqual` and `DropEqual` actions.
// Regex capture groups are available.
targetLabel?: string
}]
// HTTP scheme to use for scraping.
// `http` and `https` are the expected values unless you rewrite
// the `__scheme__` label via relabeling.
// If empty, Prometheus uses the default value `http`.
scheme?: "http" | "https"
// Timeout after which Prometheus considers the scrape to be
// failed.
// If empty, Prometheus uses the global scrape timeout unless it
// is less than the target's scrape interval value in which the
// latter is used.
scrapeTimeout?: =~"^(0|(([0-9]+)y)?(([0-9]+)w)?(([0-9]+)d)?(([0-9]+)h)?(([0-9]+)m)?(([0-9]+)s)?(([0-9]+)ms)?)$"
// Name or number of the target port of the `Pod` object behind
// the Service, the port must be specified with container port
// property.
// Deprecated: use 'port' instead.
targetPort?: (int | string) & {
string
}
// TLS configuration to use when scraping the target.
tlsConfig?: {
// Certificate authority used when verifying server certificates.
ca?: {
// ConfigMap containing data to use for the targets.
configMap?: {
// The key to select.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the ConfigMap or its key must be defined
optional?: bool
}
// Secret containing data to use for the targets.
secret?: {
// The key of the secret to select from. Must be a valid secret
// key.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the Secret or its key must be defined
optional?: bool
}
}
// Client certificate to present when doing client-authentication.
cert?: {
// ConfigMap containing data to use for the targets.
configMap?: {
// The key to select.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the ConfigMap or its key must be defined
optional?: bool
}
// Secret containing data to use for the targets.
secret?: {
// The key of the secret to select from. Must be a valid secret
// key.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the Secret or its key must be defined
optional?: bool
}
}
// Disable target certificate validation.
insecureSkipVerify?: bool
// Secret containing the client key file for the targets.
keySecret?: {
// The key of the secret to select from. Must be a valid secret
// key.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the Secret or its key must be defined
optional?: bool
}
// Used to verify the hostname for the targets.
serverName?: string
}
// `trackTimestampsStaleness` defines whether Prometheus tracks
// staleness of the metrics that have an explicit timestamp
// present in scraped data. Has no effect if `honorTimestamps` is
// false.
// It requires Prometheus >= v2.48.0.
trackTimestampsStaleness?: bool
}]
// `podTargetLabels` defines the labels which are transferred from
// the associated Kubernetes `Pod` object onto the ingested
// metrics.
podTargetLabels?: [...string]
// `sampleLimit` defines a per-scrape limit on the number of
// scraped samples that will be accepted.
sampleLimit?: int
// The scrape class to apply.
scrapeClass?: strings.MinRunes(1)
// `scrapeProtocols` defines the protocols to negotiate during a
// scrape. It tells clients the protocols supported by Prometheus
// in order of preference (from most to least preferred).
// If unset, Prometheus uses its default value.
// It requires Prometheus >= v2.49.0.
scrapeProtocols?: [..."PrometheusProto" | "OpenMetricsText0.0.1" | "OpenMetricsText1.0.0" | "PrometheusText0.0.4"]
// Label selector to select the Kubernetes `Pod` objects.
selector: {
// matchExpressions is a list of label selector requirements. The
// requirements are ANDed.
matchExpressions?: [...{
// key is the label key that the selector applies to.
key: string
// operator represents a key's relationship to a set of values.
// Valid operators are In, NotIn, Exists and DoesNotExist.
operator: string
// values is an array of string values. If the operator is In or
// NotIn, the values array must be non-empty. If the operator is
// Exists or DoesNotExist, the values array must be empty. This
// array is replaced during a strategic merge patch.
values?: [...string]
}]
// matchLabels is a map of {key,value} pairs. A single {key,value}
// in the matchLabels map is equivalent to an element of
// matchExpressions, whose key field is "key", the operator is
// "In", and the values array contains only "value". The
// requirements are ANDed.
matchLabels?: {
[string]: string
}
}
// `targetLimit` defines a limit on the number of scraped targets
// that will be accepted.
targetLimit?: int
}

View File

@@ -0,0 +1,536 @@
// Code generated by timoni. DO NOT EDIT.
//timoni:generate timoni vendor crd -f /home/jeff/workspace/holos-run/holos-infra/deploy/clusters/k2/components/prod-platform-monitoring/prod-platform-monitoring.gen.yaml
package v1
import "strings"
// Probe defines monitoring for a set of static targets or
// ingresses.
#Probe: {
// APIVersion defines the versioned schema of this representation
// of an object. Servers should convert recognized schemas to the
// latest internal value, and may reject unrecognized values.
// More info:
// https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
apiVersion: "monitoring.coreos.com/v1"
// Kind is a string value representing the REST resource this
// object represents. Servers may infer this from the endpoint
// the client submits requests to. Cannot be updated. In
// CamelCase. More info:
// https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
kind: "Probe"
metadata!: {
name!: strings.MaxRunes(253) & strings.MinRunes(1) & {
string
}
namespace!: strings.MaxRunes(63) & strings.MinRunes(1) & {
string
}
labels?: {
[string]: string
}
annotations?: {
[string]: string
}
}
// Specification of desired Ingress selection for target discovery
// by Prometheus.
spec!: #ProbeSpec
}
// Specification of desired Ingress selection for target discovery
// by Prometheus.
#ProbeSpec: {
// Authorization section for this endpoint
authorization?: {
// Selects a key of a Secret in the namespace that contains the
// credentials for authentication.
credentials?: {
// The key of the secret to select from. Must be a valid secret
// key.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the Secret or its key must be defined
optional?: bool
}
// Defines the authentication type. The value is case-insensitive.
// "Basic" is not a supported value.
// Default: "Bearer"
type?: string
}
// BasicAuth allow an endpoint to authenticate over basic
// authentication. More info:
// https://prometheus.io/docs/operating/configuration/#endpoint
basicAuth?: {
// `password` specifies a key of a Secret containing the password
// for authentication.
password?: {
// The key of the secret to select from. Must be a valid secret
// key.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the Secret or its key must be defined
optional?: bool
}
// `username` specifies a key of a Secret containing the username
// for authentication.
username?: {
// The key of the secret to select from. Must be a valid secret
// key.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the Secret or its key must be defined
optional?: bool
}
}
// Secret to mount to read bearer token for scraping targets. The
// secret needs to be in the same namespace as the probe and
// accessible by the Prometheus Operator.
bearerTokenSecret?: {
// The key of the secret to select from. Must be a valid secret
// key.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the Secret or its key must be defined
optional?: bool
}
// Interval at which targets are probed using the configured
// prober. If not specified Prometheus' global scrape interval is
// used.
interval?: =~"^(0|(([0-9]+)y)?(([0-9]+)w)?(([0-9]+)d)?(([0-9]+)h)?(([0-9]+)m)?(([0-9]+)s)?(([0-9]+)ms)?)$"
// The job name assigned to scraped metrics by default.
jobName?: string
// Per-scrape limit on the number of targets dropped by relabeling
// that will be kept in memory. 0 means no limit.
// It requires Prometheus >= v2.47.0.
keepDroppedTargets?: int
// Per-scrape limit on number of labels that will be accepted for
// a sample. Only valid in Prometheus versions 2.27.0 and newer.
labelLimit?: int
// Per-scrape limit on length of labels name that will be accepted
// for a sample. Only valid in Prometheus versions 2.27.0 and
// newer.
labelNameLengthLimit?: int
// Per-scrape limit on length of labels value that will be
// accepted for a sample. Only valid in Prometheus versions
// 2.27.0 and newer.
labelValueLengthLimit?: int
// MetricRelabelConfigs to apply to samples before ingestion.
metricRelabelings?: [...{
// Action to perform based on the regex matching.
// `Uppercase` and `Lowercase` actions require Prometheus >=
// v2.36.0. `DropEqual` and `KeepEqual` actions require
// Prometheus >= v2.41.0.
// Default: "Replace"
action?: "replace" | "Replace" | "keep" | "Keep" | "drop" | "Drop" | "hashmod" | "HashMod" | "labelmap" | "LabelMap" | "labeldrop" | "LabelDrop" | "labelkeep" | "LabelKeep" | "lowercase" | "Lowercase" | "uppercase" | "Uppercase" | "keepequal" | "KeepEqual" | "dropequal" | "DropEqual" | *"replace"
// Modulus to take of the hash of the source label values.
// Only applicable when the action is `HashMod`.
modulus?: int
// Regular expression against which the extracted value is
// matched.
regex?: string
// Replacement value against which a Replace action is performed
// if the regular expression matches.
// Regex capture groups are available.
replacement?: string
// Separator is the string between concatenated SourceLabels.
separator?: string
// The source labels select values from existing labels. Their
// content is concatenated using the configured Separator and
// matched against the configured regular expression.
sourceLabels?: [...=~"^[a-zA-Z_][a-zA-Z0-9_]*$"]
// Label to which the resulting string is written in a
// replacement.
// It is mandatory for `Replace`, `HashMod`, `Lowercase`,
// `Uppercase`, `KeepEqual` and `DropEqual` actions.
// Regex capture groups are available.
targetLabel?: string
}]
// The module to use for probing specifying how to probe the
// target. Example module configuring in the blackbox exporter:
// https://github.com/prometheus/blackbox_exporter/blob/master/example.yml
module?: string
// OAuth2 for the URL. Only valid in Prometheus versions 2.27.0
// and newer.
oauth2?: {
// `clientId` specifies a key of a Secret or ConfigMap containing
// the OAuth2 client's ID.
clientId: {
// ConfigMap containing data to use for the targets.
configMap?: {
// The key to select.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the ConfigMap or its key must be defined
optional?: bool
}
// Secret containing data to use for the targets.
secret?: {
// The key of the secret to select from. Must be a valid secret
// key.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the Secret or its key must be defined
optional?: bool
}
}
// `clientSecret` specifies a key of a Secret containing the
// OAuth2 client's secret.
clientSecret: {
// The key of the secret to select from. Must be a valid secret
// key.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the Secret or its key must be defined
optional?: bool
}
// `endpointParams` configures the HTTP parameters to append to
// the token URL.
endpointParams?: {
[string]: string
}
// `scopes` defines the OAuth2 scopes used for the token request.
scopes?: [...string]
// `tokenURL` configures the URL to fetch the token from.
tokenUrl: strings.MinRunes(1)
}
// Specification for the prober to use for probing targets. The
// prober.URL parameter is required. Targets cannot be probed if
// left empty.
prober?: {
// Path to collect metrics from. Defaults to `/probe`.
path?: string | *"/probe"
// Optional ProxyURL.
proxyUrl?: string
// HTTP scheme to use for scraping. `http` and `https` are the
// expected values unless you rewrite the `__scheme__` label via
// relabeling. If empty, Prometheus uses the default value
// `http`.
scheme?: "http" | "https"
// Mandatory URL of the prober.
url: string
}
// SampleLimit defines per-scrape limit on number of scraped
// samples that will be accepted.
sampleLimit?: int
// The scrape class to apply.
scrapeClass?: strings.MinRunes(1)
// `scrapeProtocols` defines the protocols to negotiate during a
// scrape. It tells clients the protocols supported by Prometheus
// in order of preference (from most to least preferred).
// If unset, Prometheus uses its default value.
// It requires Prometheus >= v2.49.0.
scrapeProtocols?: [..."PrometheusProto" | "OpenMetricsText0.0.1" | "OpenMetricsText1.0.0" | "PrometheusText0.0.4"]
// Timeout for scraping metrics from the Prometheus exporter. If
// not specified, the Prometheus global scrape timeout is used.
scrapeTimeout?: =~"^(0|(([0-9]+)y)?(([0-9]+)w)?(([0-9]+)d)?(([0-9]+)h)?(([0-9]+)m)?(([0-9]+)s)?(([0-9]+)ms)?)$"
// TargetLimit defines a limit on the number of scraped targets
// that will be accepted.
targetLimit?: int
// Targets defines a set of static or dynamically discovered
// targets to probe.
targets?: {
// ingress defines the Ingress objects to probe and the relabeling
// configuration. If `staticConfig` is also defined,
// `staticConfig` takes precedence.
ingress?: {
// From which namespaces to select Ingress objects.
namespaceSelector?: {
// Boolean describing whether all namespaces are selected in
// contrast to a list restricting them.
any?: bool
// List of namespace names to select from.
matchNames?: [...string]
}
// RelabelConfigs to apply to the label set of the target before
// it gets scraped. The original ingress address is available via
// the `__tmp_prometheus_ingress_address` label. It can be used
// to customize the probed URL. The original scrape job's name is
// available via the `__tmp_prometheus_job_name` label. More
// info:
// https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config
relabelingConfigs?: [...{
// Action to perform based on the regex matching.
// `Uppercase` and `Lowercase` actions require Prometheus >=
// v2.36.0. `DropEqual` and `KeepEqual` actions require
// Prometheus >= v2.41.0.
// Default: "Replace"
action?: "replace" | "Replace" | "keep" | "Keep" | "drop" | "Drop" | "hashmod" | "HashMod" | "labelmap" | "LabelMap" | "labeldrop" | "LabelDrop" | "labelkeep" | "LabelKeep" | "lowercase" | "Lowercase" | "uppercase" | "Uppercase" | "keepequal" | "KeepEqual" | "dropequal" | "DropEqual" | *"replace"
// Modulus to take of the hash of the source label values.
// Only applicable when the action is `HashMod`.
modulus?: int
// Regular expression against which the extracted value is
// matched.
regex?: string
// Replacement value against which a Replace action is performed
// if the regular expression matches.
// Regex capture groups are available.
replacement?: string
// Separator is the string between concatenated SourceLabels.
separator?: string
// The source labels select values from existing labels. Their
// content is concatenated using the configured Separator and
// matched against the configured regular expression.
sourceLabels?: [...=~"^[a-zA-Z_][a-zA-Z0-9_]*$"]
// Label to which the resulting string is written in a
// replacement.
// It is mandatory for `Replace`, `HashMod`, `Lowercase`,
// `Uppercase`, `KeepEqual` and `DropEqual` actions.
// Regex capture groups are available.
targetLabel?: string
}]
// Selector to select the Ingress objects.
selector?: {
// matchExpressions is a list of label selector requirements. The
// requirements are ANDed.
matchExpressions?: [...{
// key is the label key that the selector applies to.
key: string
// operator represents a key's relationship to a set of values.
// Valid operators are In, NotIn, Exists and DoesNotExist.
operator: string
// values is an array of string values. If the operator is In or
// NotIn, the values array must be non-empty. If the operator is
// Exists or DoesNotExist, the values array must be empty. This
// array is replaced during a strategic merge patch.
values?: [...string]
}]
// matchLabels is a map of {key,value} pairs. A single {key,value}
// in the matchLabels map is equivalent to an element of
// matchExpressions, whose key field is "key", the operator is
// "In", and the values array contains only "value". The
// requirements are ANDed.
matchLabels?: {
[string]: string
}
}
}
// staticConfig defines the static list of targets to probe and
// the relabeling configuration. If `ingress` is also defined,
// `staticConfig` takes precedence. More info:
// https://prometheus.io/docs/prometheus/latest/configuration/configuration/#static_config.
staticConfig?: {
// Labels assigned to all metrics scraped from the targets.
labels?: {
[string]: string
}
// RelabelConfigs to apply to the label set of the targets before
// it gets scraped. More info:
// https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config
relabelingConfigs?: [...{
// Action to perform based on the regex matching.
// `Uppercase` and `Lowercase` actions require Prometheus >=
// v2.36.0. `DropEqual` and `KeepEqual` actions require
// Prometheus >= v2.41.0.
// Default: "Replace"
action?: "replace" | "Replace" | "keep" | "Keep" | "drop" | "Drop" | "hashmod" | "HashMod" | "labelmap" | "LabelMap" | "labeldrop" | "LabelDrop" | "labelkeep" | "LabelKeep" | "lowercase" | "Lowercase" | "uppercase" | "Uppercase" | "keepequal" | "KeepEqual" | "dropequal" | "DropEqual" | *"replace"
// Modulus to take of the hash of the source label values.
// Only applicable when the action is `HashMod`.
modulus?: int
// Regular expression against which the extracted value is
// matched.
regex?: string
// Replacement value against which a Replace action is performed
// if the regular expression matches.
// Regex capture groups are available.
replacement?: string
// Separator is the string between concatenated SourceLabels.
separator?: string
// The source labels select values from existing labels. Their
// content is concatenated using the configured Separator and
// matched against the configured regular expression.
sourceLabels?: [...=~"^[a-zA-Z_][a-zA-Z0-9_]*$"]
// Label to which the resulting string is written in a
// replacement.
// It is mandatory for `Replace`, `HashMod`, `Lowercase`,
// `Uppercase`, `KeepEqual` and `DropEqual` actions.
// Regex capture groups are available.
targetLabel?: string
}]
// The list of hosts to probe.
static?: [...string]
}
}
// TLS configuration to use when scraping the endpoint.
tlsConfig?: {
// Certificate authority used when verifying server certificates.
ca?: {
// ConfigMap containing data to use for the targets.
configMap?: {
// The key to select.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the ConfigMap or its key must be defined
optional?: bool
}
// Secret containing data to use for the targets.
secret?: {
// The key of the secret to select from. Must be a valid secret
// key.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the Secret or its key must be defined
optional?: bool
}
}
// Client certificate to present when doing client-authentication.
cert?: {
// ConfigMap containing data to use for the targets.
configMap?: {
// The key to select.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the ConfigMap or its key must be defined
optional?: bool
}
// Secret containing data to use for the targets.
secret?: {
// The key of the secret to select from. Must be a valid secret
// key.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the Secret or its key must be defined
optional?: bool
}
}
// Disable target certificate validation.
insecureSkipVerify?: bool
// Secret containing the client key file for the targets.
keySecret?: {
// The key of the secret to select from. Must be a valid secret
// key.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the Secret or its key must be defined
optional?: bool
}
// Used to verify the hostname for the targets.
serverName?: string
}
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,100 @@
// Code generated by timoni. DO NOT EDIT.
//timoni:generate timoni vendor crd -f /home/jeff/workspace/holos-run/holos-infra/deploy/clusters/k2/components/prod-platform-monitoring/prod-platform-monitoring.gen.yaml
package v1
import "strings"
// PrometheusRule defines recording and alerting rules for a
// Prometheus instance
#PrometheusRule: {
// APIVersion defines the versioned schema of this representation
// of an object. Servers should convert recognized schemas to the
// latest internal value, and may reject unrecognized values.
// More info:
// https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
apiVersion: "monitoring.coreos.com/v1"
// Kind is a string value representing the REST resource this
// object represents. Servers may infer this from the endpoint
// the client submits requests to. Cannot be updated. In
// CamelCase. More info:
// https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
kind: "PrometheusRule"
metadata!: {
name!: strings.MaxRunes(253) & strings.MinRunes(1) & {
string
}
namespace!: strings.MaxRunes(63) & strings.MinRunes(1) & {
string
}
labels?: {
[string]: string
}
annotations?: {
[string]: string
}
}
// Specification of desired alerting rule definitions for
// Prometheus.
spec!: #PrometheusRuleSpec
}
#PrometheusRuleSpec: {
// Content of Prometheus rule file
groups?: [...{
// Interval determines how often rules in the group are evaluated.
interval?: =~"^(0|(([0-9]+)y)?(([0-9]+)w)?(([0-9]+)d)?(([0-9]+)h)?(([0-9]+)m)?(([0-9]+)s)?(([0-9]+)ms)?)$"
// Limit the number of alerts an alerting rule and series a
// recording rule can produce. Limit is supported starting with
// Prometheus >= 2.31 and Thanos Ruler >= 0.24.
limit?: int
// Name of the rule group.
name: strings.MinRunes(1)
// PartialResponseStrategy is only used by ThanosRuler and will be
// ignored by Prometheus instances. More info:
// https://github.com/thanos-io/thanos/blob/main/docs/components/rule.md#partial-response
partial_response_strategy?: =~"^(?i)(abort|warn)?$"
// List of alerting and recording rules.
rules?: [...{
// Name of the alert. Must be a valid label value. Only one of
// `record` and `alert` must be set.
alert?: string
// Annotations to add to each alert. Only valid for alerting
// rules.
annotations?: {
[string]: string
}
// PromQL expression to evaluate.
expr: (int | string) & {
string
}
// Alerts are considered firing once they have been returned for
// this long.
for?: =~"^(0|(([0-9]+)y)?(([0-9]+)w)?(([0-9]+)d)?(([0-9]+)h)?(([0-9]+)m)?(([0-9]+)s)?(([0-9]+)ms)?)$"
// KeepFiringFor defines how long an alert will continue firing
// after the condition that triggered it has cleared.
keep_firing_for?: strings.MinRunes(1) & {
=~"^(0|(([0-9]+)y)?(([0-9]+)w)?(([0-9]+)d)?(([0-9]+)h)?(([0-9]+)m)?(([0-9]+)s)?(([0-9]+)ms)?)$"
}
// Labels to add or overwrite.
labels?: {
[string]: string
}
// Name of the time series to output to. Must be a valid metric
// name. Only one of `record` and `alert` must be set.
record?: string
}]
}]
}

View File

@@ -0,0 +1,566 @@
// Code generated by timoni. DO NOT EDIT.
//timoni:generate timoni vendor crd -f /home/jeff/workspace/holos-run/holos-infra/deploy/clusters/k2/components/prod-platform-monitoring/prod-platform-monitoring.gen.yaml
package v1
import "strings"
// ServiceMonitor defines monitoring for a set of services.
#ServiceMonitor: {
// APIVersion defines the versioned schema of this representation
// of an object. Servers should convert recognized schemas to the
// latest internal value, and may reject unrecognized values.
// More info:
// https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
apiVersion: "monitoring.coreos.com/v1"
// Kind is a string value representing the REST resource this
// object represents. Servers may infer this from the endpoint
// the client submits requests to. Cannot be updated. In
// CamelCase. More info:
// https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
kind: "ServiceMonitor"
metadata!: {
name!: strings.MaxRunes(253) & strings.MinRunes(1) & {
string
}
namespace!: strings.MaxRunes(63) & strings.MinRunes(1) & {
string
}
labels?: {
[string]: string
}
annotations?: {
[string]: string
}
}
// Specification of desired Service selection for target discovery
// by Prometheus.
spec!: #ServiceMonitorSpec
}
// Specification of desired Service selection for target discovery
// by Prometheus.
#ServiceMonitorSpec: {
attachMetadata?: {
// When set to true, Prometheus must have the `get` permission on
// the `Nodes` objects.
node?: bool
}
// List of endpoints part of this ServiceMonitor.
endpoints?: [...{
// `authorization` configures the Authorization header credentials
// to use when scraping the target.
// Cannot be set at the same time as `basicAuth`, or `oauth2`.
authorization?: {
// Selects a key of a Secret in the namespace that contains the
// credentials for authentication.
credentials?: {
// The key of the secret to select from. Must be a valid secret
// key.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the Secret or its key must be defined
optional?: bool
}
// Defines the authentication type. The value is case-insensitive.
// "Basic" is not a supported value.
// Default: "Bearer"
type?: string
}
// `basicAuth` configures the Basic Authentication credentials to
// use when scraping the target.
// Cannot be set at the same time as `authorization`, or `oauth2`.
basicAuth?: {
// `password` specifies a key of a Secret containing the password
// for authentication.
password?: {
// The key of the secret to select from. Must be a valid secret
// key.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the Secret or its key must be defined
optional?: bool
}
// `username` specifies a key of a Secret containing the username
// for authentication.
username?: {
// The key of the secret to select from. Must be a valid secret
// key.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the Secret or its key must be defined
optional?: bool
}
}
// File to read bearer token for scraping the target.
// Deprecated: use `authorization` instead.
bearerTokenFile?: string
// `bearerTokenSecret` specifies a key of a Secret containing the
// bearer token for scraping targets. The secret needs to be in
// the same namespace as the ServiceMonitor object and readable
// by the Prometheus Operator.
// Deprecated: use `authorization` instead.
bearerTokenSecret?: {
// The key of the secret to select from. Must be a valid secret
// key.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the Secret or its key must be defined
optional?: bool
}
// `enableHttp2` can be used to disable HTTP2 when scraping the
// target.
enableHttp2?: bool
// When true, the pods which are not running (e.g. either in
// Failed or Succeeded state) are dropped during the target
// discovery.
// If unset, the filtering is enabled.
// More info:
// https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase
filterRunning?: bool
// `followRedirects` defines whether the scrape requests should
// follow HTTP 3xx redirects.
followRedirects?: bool
// When true, `honorLabels` preserves the metric's labels when
// they collide with the target's labels.
honorLabels?: bool
// `honorTimestamps` controls whether Prometheus preserves the
// timestamps when exposed by the target.
honorTimestamps?: bool
// Interval at which Prometheus scrapes the metrics from the
// target.
// If empty, Prometheus uses the global scrape interval.
interval?: =~"^(0|(([0-9]+)y)?(([0-9]+)w)?(([0-9]+)d)?(([0-9]+)h)?(([0-9]+)m)?(([0-9]+)s)?(([0-9]+)ms)?)$"
// `metricRelabelings` configures the relabeling rules to apply to
// the samples before ingestion.
metricRelabelings?: [...{
// Action to perform based on the regex matching.
// `Uppercase` and `Lowercase` actions require Prometheus >=
// v2.36.0. `DropEqual` and `KeepEqual` actions require
// Prometheus >= v2.41.0.
// Default: "Replace"
action?: "replace" | "Replace" | "keep" | "Keep" | "drop" | "Drop" | "hashmod" | "HashMod" | "labelmap" | "LabelMap" | "labeldrop" | "LabelDrop" | "labelkeep" | "LabelKeep" | "lowercase" | "Lowercase" | "uppercase" | "Uppercase" | "keepequal" | "KeepEqual" | "dropequal" | "DropEqual" | *"replace"
// Modulus to take of the hash of the source label values.
// Only applicable when the action is `HashMod`.
modulus?: int
// Regular expression against which the extracted value is
// matched.
regex?: string
// Replacement value against which a Replace action is performed
// if the regular expression matches.
// Regex capture groups are available.
replacement?: string
// Separator is the string between concatenated SourceLabels.
separator?: string
// The source labels select values from existing labels. Their
// content is concatenated using the configured Separator and
// matched against the configured regular expression.
sourceLabels?: [...=~"^[a-zA-Z_][a-zA-Z0-9_]*$"]
// Label to which the resulting string is written in a
// replacement.
// It is mandatory for `Replace`, `HashMod`, `Lowercase`,
// `Uppercase`, `KeepEqual` and `DropEqual` actions.
// Regex capture groups are available.
targetLabel?: string
}]
// `oauth2` configures the OAuth2 settings to use when scraping
// the target.
// It requires Prometheus >= 2.27.0.
// Cannot be set at the same time as `authorization`, or
// `basicAuth`.
oauth2?: {
// `clientId` specifies a key of a Secret or ConfigMap containing
// the OAuth2 client's ID.
clientId: {
// ConfigMap containing data to use for the targets.
configMap?: {
// The key to select.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the ConfigMap or its key must be defined
optional?: bool
}
// Secret containing data to use for the targets.
secret?: {
// The key of the secret to select from. Must be a valid secret
// key.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the Secret or its key must be defined
optional?: bool
}
}
// `clientSecret` specifies a key of a Secret containing the
// OAuth2 client's secret.
clientSecret: {
// The key of the secret to select from. Must be a valid secret
// key.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the Secret or its key must be defined
optional?: bool
}
// `endpointParams` configures the HTTP parameters to append to
// the token URL.
endpointParams?: {
[string]: string
}
// `scopes` defines the OAuth2 scopes used for the token request.
scopes?: [...string]
// `tokenURL` configures the URL to fetch the token from.
tokenUrl: strings.MinRunes(1)
}
// params define optional HTTP URL parameters.
params?: {
[string]: [...string]
}
// HTTP path from which to scrape for metrics.
// If empty, Prometheus uses the default value (e.g. `/metrics`).
path?: string
// Name of the Service port which this endpoint refers to.
// It takes precedence over `targetPort`.
port?: string
// `proxyURL` configures the HTTP Proxy URL (e.g.
// "http://proxyserver:2195") to go through when scraping the
// target.
proxyUrl?: string
// `relabelings` configures the relabeling rules to apply the
// target's metadata labels.
// The Operator automatically adds relabelings for a few standard
// Kubernetes fields.
// The original scrape job's name is available via the
// `__tmp_prometheus_job_name` label.
// More info:
// https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config
relabelings?: [...{
// Action to perform based on the regex matching.
// `Uppercase` and `Lowercase` actions require Prometheus >=
// v2.36.0. `DropEqual` and `KeepEqual` actions require
// Prometheus >= v2.41.0.
// Default: "Replace"
action?: "replace" | "Replace" | "keep" | "Keep" | "drop" | "Drop" | "hashmod" | "HashMod" | "labelmap" | "LabelMap" | "labeldrop" | "LabelDrop" | "labelkeep" | "LabelKeep" | "lowercase" | "Lowercase" | "uppercase" | "Uppercase" | "keepequal" | "KeepEqual" | "dropequal" | "DropEqual" | *"replace"
// Modulus to take of the hash of the source label values.
// Only applicable when the action is `HashMod`.
modulus?: int
// Regular expression against which the extracted value is
// matched.
regex?: string
// Replacement value against which a Replace action is performed
// if the regular expression matches.
// Regex capture groups are available.
replacement?: string
// Separator is the string between concatenated SourceLabels.
separator?: string
// The source labels select values from existing labels. Their
// content is concatenated using the configured Separator and
// matched against the configured regular expression.
sourceLabels?: [...=~"^[a-zA-Z_][a-zA-Z0-9_]*$"]
// Label to which the resulting string is written in a
// replacement.
// It is mandatory for `Replace`, `HashMod`, `Lowercase`,
// `Uppercase`, `KeepEqual` and `DropEqual` actions.
// Regex capture groups are available.
targetLabel?: string
}]
// HTTP scheme to use for scraping.
// `http` and `https` are the expected values unless you rewrite
// the `__scheme__` label via relabeling.
// If empty, Prometheus uses the default value `http`.
scheme?: "http" | "https"
// Timeout after which Prometheus considers the scrape to be
// failed.
// If empty, Prometheus uses the global scrape timeout unless it
// is less than the target's scrape interval value in which the
// latter is used.
scrapeTimeout?: =~"^(0|(([0-9]+)y)?(([0-9]+)w)?(([0-9]+)d)?(([0-9]+)h)?(([0-9]+)m)?(([0-9]+)s)?(([0-9]+)ms)?)$"
// Name or number of the target port of the `Pod` object behind
// the Service. The port must be specified with the container's
// port property.
targetPort?: (int | string) & {
string
}
// TLS configuration to use when scraping the target.
tlsConfig?: {
// Certificate authority used when verifying server certificates.
ca?: {
// ConfigMap containing data to use for the targets.
configMap?: {
// The key to select.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the ConfigMap or its key must be defined
optional?: bool
}
// Secret containing data to use for the targets.
secret?: {
// The key of the secret to select from. Must be a valid secret
// key.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the Secret or its key must be defined
optional?: bool
}
}
// Path to the CA cert in the Prometheus container to use for the
// targets.
caFile?: string
// Client certificate to present when doing client-authentication.
cert?: {
// ConfigMap containing data to use for the targets.
configMap?: {
// The key to select.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the ConfigMap or its key must be defined
optional?: bool
}
// Secret containing data to use for the targets.
secret?: {
// The key of the secret to select from. Must be a valid secret
// key.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the Secret or its key must be defined
optional?: bool
}
}
// Path to the client cert file in the Prometheus container for
// the targets.
certFile?: string
// Disable target certificate validation.
insecureSkipVerify?: bool
// Path to the client key file in the Prometheus container for the
// targets.
keyFile?: string
// Secret containing the client key file for the targets.
keySecret?: {
// The key of the secret to select from. Must be a valid secret
// key.
key: string
// Name of the referent. More info:
// https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
// TODO: Add other useful fields. apiVersion, kind, uid?
name?: string
// Specify whether the Secret or its key must be defined
optional?: bool
}
// Used to verify the hostname for the targets.
serverName?: string
}
// `trackTimestampsStaleness` defines whether Prometheus tracks
// staleness of the metrics that have an explicit timestamp
// present in scraped data. Has no effect if `honorTimestamps` is
// false.
// It requires Prometheus >= v2.48.0.
trackTimestampsStaleness?: bool
}]
// `jobLabel` selects the label from the associated Kubernetes
// `Service` object which will be used as the `job` label for all
// metrics.
// For example if `jobLabel` is set to `foo` and the Kubernetes
// `Service` object is labeled with `foo: bar`, then Prometheus
// adds the `job="bar"` label to all ingested metrics.
// If the value of this field is empty or if the label doesn't
// exist for the given Service, the `job` label of the metrics
// defaults to the name of the associated Kubernetes `Service`.
jobLabel?: string
// Per-scrape limit on the number of targets dropped by relabeling
// that will be kept in memory. 0 means no limit.
// It requires Prometheus >= v2.47.0.
keepDroppedTargets?: int
// Per-scrape limit on number of labels that will be accepted for
// a sample.
// It requires Prometheus >= v2.27.0.
labelLimit?: int
// Per-scrape limit on length of labels name that will be accepted
// for a sample.
// It requires Prometheus >= v2.27.0.
labelNameLengthLimit?: int
// Per-scrape limit on length of labels value that will be
// accepted for a sample.
// It requires Prometheus >= v2.27.0.
labelValueLengthLimit?: int
// Selector to select which namespaces the Kubernetes `Endpoints`
// objects are discovered from.
namespaceSelector?: {
// Boolean describing whether all namespaces are selected in
// contrast to a list restricting them.
any?: bool
// List of namespace names to select from.
matchNames?: [...string]
}
// `podTargetLabels` defines the labels which are transferred from
// the associated Kubernetes `Pod` object onto the ingested
// metrics.
podTargetLabels?: [...string]
// `sampleLimit` defines a per-scrape limit on the number of
// scraped samples that will be accepted.
sampleLimit?: int
// The scrape class to apply.
scrapeClass?: strings.MinRunes(1)
// `scrapeProtocols` defines the protocols to negotiate during a
// scrape. It tells clients the protocols supported by Prometheus
// in order of preference (from most to least preferred).
// If unset, Prometheus uses its default value.
// It requires Prometheus >= v2.49.0.
scrapeProtocols?: [..."PrometheusProto" | "OpenMetricsText0.0.1" | "OpenMetricsText1.0.0" | "PrometheusText0.0.4"]
// Label selector to select the Kubernetes `Endpoints` objects.
selector: {
// matchExpressions is a list of label selector requirements. The
// requirements are ANDed.
matchExpressions?: [...{
// key is the label key that the selector applies to.
key: string
// operator represents a key's relationship to a set of values.
// Valid operators are In, NotIn, Exists and DoesNotExist.
operator: string
// values is an array of string values. If the operator is In or
// NotIn, the values array must be non-empty. If the operator is
// Exists or DoesNotExist, the values array must be empty. This
// array is replaced during a strategic merge patch.
values?: [...string]
}]
// matchLabels is a map of {key,value} pairs. A single {key,value}
// in the matchLabels map is equivalent to an element of
// matchExpressions, whose key field is "key", the operator is
// "In", and the values array contains only "value". The
// requirements are ANDed.
matchLabels?: {
[string]: string
}
}
// `targetLabels` defines the labels which are transferred from
// the associated Kubernetes `Service` object onto the ingested
// metrics.
targetLabels?: [...string]
// `targetLimit` defines a limit on the number of scraped targets
// that will be accepted.
targetLimit?: int
}

File diff suppressed because it is too large Load Diff

View File

@@ -306,19 +306,10 @@ import "strings"
// "value"` for prefix-based match - `regex: "value"` for RE2
// style regex-based match
// (https://github.com/google/re2/wiki/Syntax).
uri?: ({} | {
exact: _
} | {
prefix: _
} | {
regex: _
}) & {
uri?: {
exact?: string
prefix?: string
// RE2 style regex-based match
// (https://github.com/google/re2/wiki/Syntax).
regex?: string
regex?: string
}
// withoutHeader has the same syntax with the header, but has

View File

@@ -1,3 +1 @@
package v1alpha1
#HolosComponent: metadata: name: string

View File

@@ -7,14 +7,21 @@ import "encoding/yaml"
// apiObjects holds each the api objects produced by cue.
apiObjects: {
[Kind=_]: {
[Name=_]: {
[string]: {
kind: Kind
...
}
}
Namespace?: [Name=_]: #Namespace & {metadata: name: Name}
ExternalSecret?: [Name=_]: #ExternalSecret & {_name: Name}
VirtualService?: [Name=_]: #VirtualService & {metadata: name: Name}
Issuer?: [Name=_]: #Issuer & {metadata: name: Name}
Gateway?: [Name=_]: #Gateway & {metadata: name: Name}
ConfigMap?: [Name=_]: #ConfigMap & {metadata: name: Name}
Deployment?: [_]: #Deployment
RequestAuthentication?: [_]: #RequestAuthentication
AuthorizationPolicy?: [_]: #AuthorizationPolicy
}
// apiObjectMap holds the marshalled representation of apiObjects

View File

@@ -24,6 +24,8 @@ let DependsOn = {[Name=_]: name: string & Name}
metadata: name: string
#namelen: len(metadata.name) & >=1
let Name = metadata.name
// TODO: ksContent needs to be component scoped, not instance scoped.
ksContent: yaml.Marshal(#Kustomization & {
_dependsOn: DEPENDS_ON
metadata: name: Name
@@ -47,7 +49,7 @@ let DependsOn = {[Name=_]: name: string & Name}
// Holos component types.
#HelmChart: #HolosComponent & h.#HelmChart & {
_values: _
_values: {...}
_kustomizeFiles: #KustomizeFiles
// Render the values to yaml for holos to provide to helm.
@@ -57,6 +59,11 @@ let DependsOn = {[Name=_]: name: string & Name}
resourcesFile: h.#ResourcesFile
// kustomizeFiles represents the files in a kustomize directory tree.
kustomizeFiles: _kustomizeFiles.Files
chart: h.#Chart & {
name: string
release: string | *name
}
}
#KubernetesObjects: #HolosComponent & h.#KubernetesObjects
#KustomizeBuild: #HolosComponent & h.#KustomizeBuild
@@ -93,10 +100,6 @@ let DependsOn = {[Name=_]: name: string & Name}
}
}
// #KustomizeTree represents a kustomize build.
#KustomizeFiles: {
}
// #Kustomize represents the kustomize post processor.
#Kustomize: kc.#Kustomization & {
_patches: {[_]: kc.#Patch}
@@ -109,3 +112,6 @@ let DependsOn = {[Name=_]: name: string & Name}
patches: [for v in _patches {v}]
}
}
// So components don't need to import the package.
#Patch: kc.#Patch

View File

@@ -0,0 +1,26 @@
package holos
// NOTE: Beyond the base reference platform, services should typically be added to #OptionalServices instead of directly to a managed namespace.
// ManagedNamespace is a namespace to manage across all clusters in the holos platform.
#ManagedNamespace: {
namespace: {
metadata: {
name: string
labels: [string]: string
}
}
// clusterNames represents the set of clusters the namespace is managed on. Usually all clusters.
clusterNames: [...string]
for cluster in clusterNames {
clusters: (cluster): name: cluster
}
}
// #ManagedNamepsaces is the union of all namespaces across all cluster types and optional services.
// Holos adopts the namespace sameness position of SIG Multicluster, refer to https://github.com/kubernetes/community/blob/dd4c8b704ef1c9c3bfd928c6fa9234276d61ad18/sig-multicluster/namespace-sameness-position-statement.md
#ManagedNamespaces: {
[Name=_]: #ManagedNamespace & {
namespace: metadata: name: Name
}
}

View File

@@ -0,0 +1,54 @@
package holos
// #MeshConfig provides the istio meshconfig in the config key given projects.
#MeshConfig: {
projects: #Projects
// clusterName is the value of the --cluster-name flag, the cluster currently being manged / rendered.
clusterName: string | *#ClusterName
// for extAuthzHttp extension providers
extensionProviderMap: [Name=_]: #ExtAuthzProxy & {name: Name}
// for other extension providers like zipkin
extensionProviderExtraMap: [Name=_]: {name: Name}
config: {
accessLogEncoding: string | *"JSON"
accessLogFile: string | *"/dev/stdout"
defaultConfig: {
discoveryAddress: string | *"istiod.istio-system.svc:15012"
tracing: zipkin: address: string | *"zipkin.istio-system:9411"
}
defaultProviders: metrics: [...string] | *["prometheus"]
enablePrometheusMerge: false | *true
rootNamespace: string | *"istio-system"
trustDomain: string | *"cluster.local"
extensionProviders: [
for x in extensionProviderMap {x},
for y in extensionProviderExtraMap {y},
]
}
}
// #ExtAuthzProxy defines the provider configuration for an istio external authorization auth proxy.
#ExtAuthzProxy: {
name: string
envoyExtAuthzHttp: {
headersToDownstreamOnDeny: [
"content-type",
"set-cookie",
]
headersToUpstreamOnAllow: [
"authorization",
"path",
"x-oidc-id-token",
]
includeAdditionalHeadersInCheck: "X-Auth-Request-Redirect": "%REQ(x-forwarded-proto)%://%REQ(:authority)%%REQ(:path)%%REQ(:query)%"
includeRequestHeadersInCheck: [
"authorization",
"cookie",
"x-forwarded-for",
]
port: 4180
service: string
}
}

View File

@@ -1,6 +1,8 @@
// Controls optional feature flags for services distributed across multiple holos components.
// For example, enable issuing certificates in the provisioner cluster when an optional service is
// enabled for a workload cluster.
// enabled for a workload cluster. Another example is NATS, which isn't necessary on all clusters,
// but is necessary on clusters with a project like holos which depends on NATS.
package holos
import "list"

View File

@@ -1,5 +1,21 @@
package holos
#PlatformServers: {
for cluster in #Platform.clusters {
(cluster.name): {
"https-istio-ingress-httpbin": {
let cert = #PlatformCerts[cluster.name+"-httpbin"]
hosts: [for host in cert.spec.dnsNames {"istio-ingress/\(host)"}]
port: name: "https-istio-ingress-httpbin"
port: number: 443
port: protocol: "HTTPS"
tls: credentialName: cert.spec.secretName
tls: mode: "SIMPLE"
}
}
}
}
#PlatformCerts: {
// Globally scoped platform services are defined here.
login: #PlatformCert & {

View File

@@ -4,4 +4,4 @@ package holos
#InputKeys: project: "iam"
// Shared dependencies for all components in this collection.
#DependsOn: namespaces: name: "\(#StageName)-secrets-namespaces"
#DependsOn: namespaces: name: "\(#StageName)-secrets-stores"

View File

@@ -0,0 +1,10 @@
To deploy monitoring:
> **_NOTE:_** For more detailed instructions on deploying, see the [documentation on installing Monitoring](https://access.crunchydata.com/documentation/postgres-operator/latest/installation/monitoring/kustomize).
1. verify the namespace is correct in kustomization.yaml
2. If you are deploying in openshift, comment out the fsGroup line under securityContext in the following files:
- `alertmanager/deployment.yaml`
- `grafana/deployment.yaml`
- `prometheus/deployment.yaml`
3. kubectl apply -k .

View File

@@ -0,0 +1,78 @@
###
#
# Copyright © 2017-2024 Crunchy Data Solutions, Inc. All Rights Reserved.
#
###
# Based on upstream example file found here: https://github.com/prometheus/alertmanager/blob/master/doc/examples/simple.yml
global:
smtp_smarthost: 'localhost: 25'
smtp_require_tls: false
smtp_from: 'Alertmanager <abc@yahoo.com>'
# smtp_smarthost: 'smtp.example.com:587'
# smtp_from: 'Alertmanager <abc@yahoo.com>'
# smtp_auth_username: '<username>'
# smtp_auth_password: '<password>'
# templates:
# - '/etc/alertmanager/template/*.tmpl'
inhibit_rules:
# Apply inhibition of warning if the alertname for the same system and service is already critical
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'job', 'service']
receivers:
- name: 'default-receiver'
email_configs:
- to: 'example@crunchydata.com'
send_resolved: true
## Examples of alternative alert receivers. See documentation for more info on how to configure these fully
#- name: 'pagerduty-dba'
# pagerduty_configs:
# - service_key: <RANDOMKEYSTUFF>
#- name: 'pagerduty-sre'
# pagerduty_configs:
# - service_key: <RANDOMKEYSTUFF>
#- name: 'dba-team'
# email_configs:
# - to: 'example-dba-team@crunchydata.com'
# send_resolved: true
#- name: 'sre-team'
# email_configs:
# - to: 'example-sre-team@crunchydata.com'
# send_resolved: true
route:
receiver: default-receiver
group_by: [severity, service, job, alertname]
group_wait: 30s
group_interval: 5m
repeat_interval: 24h
## Example routes to show how to route outgoing alerts based on the content of that alert
# routes:
# - match_re:
# service: ^(postgresql|mysql|oracle)$
# receiver: dba-team
# # sub route to send critical dba alerts to pagerduty
# routes:
# - match:
# severity: critical
# receiver: pagerduty-dba
#
# - match:
# service: system
# receiver: sre-team
# # sub route to send critical sre alerts to pagerduty
# routes:
# - match:
# severity: critical
# receiver: pagerduty-sre

View File

@@ -0,0 +1,46 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: crunchy-alertmanager
spec:
selector: {}
template:
spec:
containers:
- name: alertmanager
image: prom/alertmanager:v0.24.0
args:
- --config.file=/etc/alertmanager/alertmanager.yml
- --storage.path=/alertmanager
- --log.level=info
- --cluster.advertise-address=0.0.0.0:9093
livenessProbe:
httpGet:
path: /-/healthy
port: 9093
initialDelaySeconds: 25
periodSeconds: 20
ports:
- containerPort: 9093
readinessProbe:
httpGet:
path: /-/ready
port: 9093
volumeMounts:
- mountPath: /etc/alertmanager
name: alertmanagerconf
- mountPath: /alertmanager
name: alertmanagerdata
securityContext:
fsGroup: 26
# supplementalGroups:
# - 65534
serviceAccountName: alertmanager
volumes:
- name: alertmanagerdata
persistentVolumeClaim:
claimName: alertmanagerdata
- name: alertmanagerconf
configMap:
defaultMode: 420
name: alertmanager-config

View File

@@ -0,0 +1,21 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
labels:
- includeSelectors: true
pairs:
app.kubernetes.io/component: crunchy-alertmanager
resources:
- deployment.yaml
- pvc.yaml
- service.yaml
- serviceaccount.yaml
configMapGenerator:
- name: alertmanager-config
files:
- config/alertmanager.yml
generatorOptions:
disableNameSuffixHash: true

View File

@@ -0,0 +1,10 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: alertmanagerdata
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi

View File

@@ -0,0 +1,9 @@
apiVersion: v1
kind: Service
metadata:
name: crunchy-alertmanager
spec:
type: ClusterIP
ports:
- name: alertmanager
port: 9093

View File

@@ -0,0 +1,4 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: alertmanager

View File

@@ -0,0 +1,16 @@
###
#
# Copyright © 2017-2024 Crunchy Data Solutions, Inc. All Rights Reserved.
#
###
apiVersion: 1
providers:
- name: 'crunchy_dashboards'
orgId: 1
folder: ''
type: file
disableDeletion: false
updateIntervalSeconds: 3 #how often Grafana will scan for changed dashboards
options:
path: /etc/grafana/provisioning/dashboards

View File

@@ -0,0 +1,18 @@
###
#
# Copyright © 2017-2024 Crunchy Data Solutions, Inc. All Rights Reserved.
#
###
# config file version
apiVersion: 1
datasources:
- name: PROMETHEUS
type: prometheus
access: proxy
url: http://$PROM_HOST:$PROM_PORT
isDefault: True
editable: False
orgId: 1
version: 1

View File

@@ -0,0 +1,16 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
configMapGenerator:
- name: grafana-dashboards
files:
- pgbackrest.json
- pod_details.json
- postgresql_overview.json
- postgresql_details.json
- postgresql_service_health.json
- prometheus_alerts.json
- query_statistics.json
generatorOptions:
disableNameSuffixHash: true

View File

@@ -0,0 +1,687 @@
{
"__inputs": [
{
"name": "DS_PROMETHEUS",
"label": "PROMETHEUS",
"description": "",
"type": "datasource",
"pluginId": "prometheus",
"pluginName": "Prometheus"
}
],
"__requires": [
{
"type": "grafana",
"id": "grafana",
"name": "Grafana",
"version": "7.4.5"
},
{
"type": "panel",
"id": "graph",
"name": "Graph",
"version": ""
},
{
"type": "datasource",
"id": "prometheus",
"name": "Prometheus",
"version": "1.0.0"
},
{
"type": "panel",
"id": "stat",
"name": "Stat",
"version": ""
}
],
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": false,
"gnetId": null,
"graphTooltip": 0,
"id": null,
"iteration": 1625069660860,
"links": [
{
"asDropdown": false,
"icon": "external link",
"includeVars": true,
"keepTime": true,
"tags": [
"vendor=crunchydata"
],
"title": "",
"type": "dashboards"
}
],
"panels": [
{
"datasource": "PROMETHEUS",
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"custom": {},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "semi-dark-blue",
"value": null
}
]
},
"unit": "dtdhms"
},
"overrides": []
},
"gridPos": {
"h": 3,
"w": 24,
"x": 0,
"y": 0
},
"id": 8,
"options": {
"colorMode": "background",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"last"
],
"fields": "/^Value$/",
"values": false
},
"text": {
"valueSize": 45
},
"textMode": "auto"
},
"pluginVersion": "7.4.5",
"targets": [
{
"expr": "time()-ccp_backrest_oldest_full_backup_time_seconds{pg_cluster=\"[[cluster]]\", role=\"master\"}",
"format": "table",
"instant": true,
"interval": "",
"legendFormat": "Recovery window",
"refId": "A"
}
],
"title": "Recovery Window",
"type": "stat"
},
{
"aliasColors": {
"Differential": "dark-blue",
"Differential Backup": "dark-blue",
"Full": "dark-green",
"Full Backup": "dark-green",
"Incremental": "light-blue",
"Incremental Backup": "light-blue"
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "PROMETHEUS",
"fieldConfig": {
"defaults": {
"custom": {},
"links": []
},
"overrides": []
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 7,
"w": 12,
"x": 0,
"y": 3
},
"hiddenSeries": false,
"id": 2,
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": 150,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"options": {
"alertThreshold": false
},
"percentage": false,
"pluginVersion": "7.4.5",
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "min(ccp_backrest_last_incr_backup_time_since_completion_seconds{pg_cluster=\"[[cluster]]\", role=\"master\"}) without(deployment,instance,ip,pod)",
"format": "time_series",
"instant": false,
"interval": "",
"intervalFactor": 1,
"legendFormat": "Incremental Backup",
"refId": "A"
},
{
"expr": "min(ccp_backrest_last_diff_backup_time_since_completion_seconds{pg_cluster=\"[[cluster]]\", role=\"master\"}) without(deployment, instance,ip,pod)",
"hide": false,
"interval": "",
"legendFormat": "Differential Backup",
"refId": "B"
},
{
"expr": "min(ccp_backrest_last_full_backup_time_since_completion_seconds{pg_cluster=\"[[cluster]]\", role=\"master\"}) without(deployment, instance,ip,pod)",
"hide": false,
"interval": "",
"legendFormat": "Full Backup",
"refId": "C"
},
{
"expr": "min(ccp_archive_command_status_seconds_since_last_archive{pg_cluster=\"[[cluster]]\", role=\"master\"}) without(deployment, instance,ip,pod)",
"hide": false,
"interval": "",
"legendFormat": "WAL Archive",
"refId": "D"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Time Since",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {
"Differential": "dark-blue",
"Full": "dark-green",
"Incremental": "light-blue"
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "PROMETHEUS",
"fieldConfig": {
"defaults": {
"custom": {},
"links": []
},
"overrides": []
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 7,
"w": 12,
"x": 12,
"y": 3
},
"hiddenSeries": false,
"id": 4,
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"hideEmpty": false,
"hideZero": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": 150,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "7.4.5",
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "min(ccp_backrest_last_info_backup_runtime_seconds{pg_cluster=\"[[cluster]]\", role=\"master\", backup_type=\"incr\"}) without (deployment,instance,pod,ip)",
"format": "time_series",
"instant": false,
"interval": "",
"intervalFactor": 1,
"legendFormat": "Incremental",
"refId": "A"
},
{
"expr": "min(ccp_backrest_last_info_backup_runtime_seconds{pg_cluster=\"[[cluster]]\", role=\"master\", backup_type=\"diff\"}) without (deployment,instance,pod,ip)",
"hide": false,
"interval": "",
"legendFormat": "Differential",
"refId": "B"
},
{
"expr": "min(ccp_backrest_last_info_backup_runtime_seconds{pg_cluster=\"[[cluster]]\", role=\"master\", backup_type=\"full\"}) without (deployment,instance,pod,ip)",
"hide": false,
"interval": "",
"legendFormat": "Full",
"refId": "C"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Backup Runtimes",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 2,
"max": null,
"min": null,
"show": false
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {
"Differential": "dark-blue",
"Full": "dark-green",
"Incremental": "light-blue"
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "PROMETHEUS",
"description": "",
"fieldConfig": {
"defaults": {
"custom": {},
"links": []
},
"overrides": []
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 7,
"w": 12,
"x": 0,
"y": 10
},
"hiddenSeries": false,
"id": 5,
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"hideEmpty": false,
"hideZero": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": 150,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "7.4.5",
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "min(ccp_backrest_last_info_repo_backup_size_bytes{pg_cluster=\"[[cluster]]\", role=\"master\", backup_type=\"incr\"}) without (deployment, instance,pod,ip)",
"format": "time_series",
"instant": false,
"interval": "",
"intervalFactor": 1,
"legendFormat": "Incremental",
"refId": "A"
},
{
"expr": "min(ccp_backrest_last_info_repo_backup_size_bytes{pg_cluster=\"[[cluster]]\", role=\"master\", backup_type=\"diff\"}) without (deployment,instance,pod,ip)",
"hide": false,
"interval": "",
"legendFormat": "Differential",
"refId": "B"
},
{
"expr": "min(ccp_backrest_last_info_repo_backup_size_bytes{pg_cluster=\"[[cluster]]\", role=\"master\", backup_type=\"full\"}) without (deployment,instance,pod,ip)",
"hide": false,
"interval": "",
"legendFormat": "Full",
"refId": "C"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Backup Size",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 2,
"max": null,
"min": null,
"show": false
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {
"Archive age": "blue",
"Archive count": "green",
"Differential": "dark-blue",
"Failed count": "red",
"Full": "dark-green",
"Incremental": "light-blue"
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "PROMETHEUS",
"description": "",
"fieldConfig": {
"defaults": {
"custom": {},
"links": []
},
"overrides": []
},
"fill": 3,
"fillGradient": 0,
"gridPos": {
"h": 7,
"w": 12,
"x": 12,
"y": 10
},
"hiddenSeries": false,
"id": 6,
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"hideEmpty": false,
"hideZero": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": 150,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "7.4.5",
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "avg(idelta(ccp_archive_command_status_failed_count{pg_cluster=\"[[cluster]]\", role=\"master\"}[1m])) without (instance,ip)",
"format": "time_series",
"instant": false,
"interval": "",
"intervalFactor": 1,
"legendFormat": "Failed count",
"refId": "A"
},
{
"expr": "avg(idelta(ccp_archive_command_status_archived_count{pg_cluster=\"[[cluster]]\", role=\"master\"}[1m])) without (instance,pod, ip)",
"hide": false,
"interval": "",
"legendFormat": "Archive count",
"refId": "B"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "WAL Stats",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": "",
"logBase": 1,
"max": null,
"min": "0",
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": "0",
"show": false
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
}
],
"refresh": "5m",
"schemaVersion": 27,
"style": "dark",
"tags": [
"vendor=crunchydata"
],
"templating": {
"list": [
{
"allValue": null,
"current": {},
"datasource": "PROMETHEUS",
"definition": "label_values(pg_cluster)",
"description": null,
"error": null,
"hide": 0,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [],
"query": {
"query": "label_values(pg_cluster)",
"refId": "PROMETHEUS-cluster-Variable-Query"
},
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-2w",
"to": "now"
},
"timepicker": {
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "browser",
"title": "pgBackRest",
"uid": "2fcFZ6PGk",
"version": 1
}

View File

@@ -0,0 +1,237 @@
{
"__inputs": [
{
"name": "DS_PROMETHEUS",
"label": "PROMETHEUS",
"description": "",
"type": "datasource",
"pluginId": "prometheus",
"pluginName": "Prometheus"
}
],
"__requires": [
{
"type": "grafana",
"id": "grafana",
"name": "Grafana",
"version": "7.4.5"
},
{
"type": "datasource",
"id": "prometheus",
"name": "Prometheus",
"version": "1.0.0"
},
{
"type": "panel",
"id": "stat",
"name": "Stat",
"version": ""
}
],
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": false,
"gnetId": null,
"graphTooltip": 0,
"id": null,
"iteration": 1625069480601,
"links": [],
"panels": [
{
"cacheTimeout": null,
"datasource": "PROMETHEUS",
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"custom": {},
"links": [
{
"targetBlank": true,
"title": "Cluster Details",
"url": "d/fMip0cuMk/postgresqldetails?$__all_variables"
},
{
"targetBlank": true,
"title": "Backup Details",
"url": "d/2fcFZ6PGk/pgbackrest?$__all_variables"
},
{
"targetBlank": true,
"title": "Pod Details",
"url": "d/4auP6Mk7k/pod-details?$__all_variables"
},
{
"targetBlank": true,
"title": "Query Statistics",
"url": "d/ZKoTOHDGk/query-statistics?$__all_variables"
},
{
"targetBlank": true,
"title": "Service Health",
"url": "d/dhG1wgsMz/postgresql-service-health?$__all_variables"
}
],
"mappings": [
{
"from": "0",
"id": 0,
"text": "DOWN",
"to": "99",
"type": 2
},
{
"from": "100",
"id": 1,
"text": "Standalone Cluster",
"to": "199",
"type": 2
},
{
"from": "200",
"id": 2,
"text": "HA CLUSTER",
"to": "1000",
"type": 2
}
],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "#bf1b00",
"value": null
},
{
"color": "#eab839",
"value": 10
},
{
"color": "#56A64B",
"value": 100
}
]
},
"unit": "short"
},
"overrides": []
},
"gridPos": {
"h": 2,
"w": 12,
"x": 0,
"y": 0
},
"id": 1,
"interval": null,
"links": [],
"maxDataPoints": 100,
"maxPerRow": 2,
"options": {
"colorMode": "background",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "horizontal",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"text": {
"valueSize": 30
},
"textMode": "auto"
},
"pluginVersion": "7.4.5",
"repeat": "cluster",
"repeatDirection": "h",
"targets": [
{
"$hashKey": "object:243",
"expr": "sum(pg_up{pg_cluster=~\"$cluster\"})*100+sum(ccp_is_in_recovery_status{pg_cluster=~\"$cluster\"})",
"format": "time_series",
"interval": "",
"intervalFactor": 1,
"legendFormat": "{{cluster}}",
"metric": "up",
"refId": "A",
"step": 2
}
],
"title": "$cluster - Overview",
"type": "stat"
}
],
"refresh": "5m",
"schemaVersion": 27,
"style": "dark",
"tags": [],
"templating": {
"list": [
{
"allFormat": "glob",
"allValue": null,
"current": {},
"datasource": "PROMETHEUS",
"definition": "label_values(pg_cluster)",
"description": null,
"error": null,
"hide": 1,
"includeAll": true,
"label": "cluster",
"multi": true,
"name": "cluster",
"options": [],
"query": {
"query": "label_values(pg_cluster)",
"refId": "PROMETHEUS-cluster-Variable-Query"
},
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 0,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-5m",
"to": "now"
},
"timepicker": {
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "browser",
"title": "PostgreSQL Overview",
"uid": "D2X39SlGk",
"version": 1
}

View File

@@ -0,0 +1,649 @@
{
"__inputs": [
{
"name": "DS_PROMETHEUS",
"label": "PROMETHEUS",
"description": "",
"type": "datasource",
"pluginId": "prometheus",
"pluginName": "Prometheus"
}
],
"__requires": [
{
"type": "grafana",
"id": "grafana",
"name": "Grafana",
"version": "7.4.5"
},
{
"type": "panel",
"id": "graph",
"name": "Graph",
"version": ""
},
{
"type": "datasource",
"id": "prometheus",
"name": "Prometheus",
"version": "1.0.0"
}
],
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": false,
"gnetId": null,
"graphTooltip": 0,
"id": null,
"iteration": 1625069909806,
"links": [
{
"asDropdown": false,
"icon": "external link",
"includeVars": true,
"keepTime": true,
"tags": [
"vendor=crunchydata"
],
"title": "",
"type": "dashboards"
}
],
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "PROMETHEUS",
"fieldConfig": {
"defaults": {
"custom": {},
"links": []
},
"overrides": []
},
"fill": 1,
"fillGradient": 5,
"gridPos": {
"h": 7,
"w": 12,
"x": 0,
"y": 0
},
"hiddenSeries": false,
"id": 6,
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": 150,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "7.4.5",
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(ccp_connection_stats_total{pg_cluster=\"[[cluster]]\",role=\"[[role]]\"}) without (pod,instance,ip) / sum(ccp_connection_stats_max_connections{pg_cluster=\"[[cluster]]\",role=\"[[role]]\"}) without (pod,instance,ip)",
"format": "time_series",
"instant": false,
"interval": "",
"intervalFactor": 1,
"legendFormat": "Connections",
"refId": "C"
},
{
"expr": "100 - 100 * avg(ccp_nodemx_data_disk_available_bytes{pg_cluster=\"[[cluster]]\",role=\"[[role]]\"}) without (pod,instance,ip) / avg(ccp_nodemx_data_disk_total_bytes{pg_cluster=\"[[cluster]]\",role=\"[[role]]\"}) without (pod,instance,ip)",
"format": "time_series",
"interval": "",
"intervalFactor": 1,
"legendFormat": "Mount:{{mount_point}}",
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Saturation (pct used)",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"decimals": null,
"format": "percent",
"label": null,
"logBase": 1,
"max": "100",
"min": "0",
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"cacheTimeout": null,
"dashLength": 10,
"dashes": false,
"datasource": "PROMETHEUS",
"fieldConfig": {
"defaults": {
"custom": {},
"links": []
},
"overrides": []
},
"fill": 1,
"fillGradient": 5,
"gridPos": {
"h": 7,
"w": 12,
"x": 12,
"y": 0
},
"hiddenSeries": false,
"id": 18,
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": 150,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "7.4.5",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"exemplar": false,
"expr": " sum(irate(ccp_stat_database_xact_commit{pg_cluster=\"[[cluster]]\",role=\"[[role]]\"}[1m])) \n+ sum(irate(ccp_stat_database_xact_rollback{pg_cluster=\"[[cluster]]\",role=\"[[role]]\"}[1m]))",
"format": "time_series",
"interval": "",
"intervalFactor": 1,
"legendFormat": "Transactions",
"refId": "A"
},
{
"expr": "max(ccp_connection_stats_active{pg_cluster=\"[[cluster]]\",role=\"[[role]]\"}) without (pod,instance,ip,dbname)",
"format": "time_series",
"interval": "",
"intervalFactor": 1,
"legendFormat": "Active connections",
"refId": "C"
},
{
"expr": "sum(irate(ccp_pg_stat_statements_total_calls_count{pg_cluster=\"[[cluster]]\",role=\"[[role]]\"}[1m]))",
"format": "time_series",
"hide": false,
"interval": "",
"intervalFactor": 1,
"legendFormat": "Queries",
"refId": "B"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Traffic",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": "",
"logBase": 1,
"max": null,
"min": "0.001",
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "PROMETHEUS",
"description": "Errors",
"fieldConfig": {
"defaults": {
"custom": {},
"links": []
},
"overrides": []
},
"fill": 1,
"fillGradient": 5,
"gridPos": {
"h": 7,
"w": 12,
"x": 0,
"y": 7
},
"hiddenSeries": false,
"id": 4,
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": 150,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "7.4.5",
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(irate(ccp_stat_database_xact_rollback{pg_cluster=\"[[cluster]]\",role=\"[[role]]\"}[1m]) without(pod,instance,ip))",
"format": "time_series",
"hide": true,
"interval": "",
"intervalFactor": 1,
"legendFormat": "Rollbacks",
"refId": "A"
},
{
"expr": "sum(irate(ccp_stat_database_deadlocks{pg_cluster=\"[[cluster]]\",role=\"[[role]]\"}[1m])) without(pod,instance,ip,dbname)",
"format": "time_series",
"hide": false,
"interval": "",
"intervalFactor": 1,
"legendFormat": "Deadlock ",
"refId": "D"
},
{
"expr": "sum(irate(ccp_stat_database_conflicts{pg_cluster=\"[[cluster]]\",role=\"[[role]]\"}[1m])) without(pod,instance,ip,dbname)",
"format": "time_series",
"hide": false,
"interval": "",
"intervalFactor": 1,
"legendFormat": "Conflicts",
"refId": "B"
},
{
"expr": "max(pg_exporter_last_scrape_error{pg_cluster=\"[[cluster]]\",role=\"[[role]]\"}) without(pod,instance,ip,dbname)",
"format": "time_series",
"hide": false,
"interval": "",
"intervalFactor": 1,
"legendFormat": "scrape error",
"refId": "C"
},
{
"expr": "max(clamp_max(ccp_archive_command_status_seconds_since_last_fail{pg_cluster=\"[[cluster]]\",role=\"[[role]]\"},1)) without (instance,pod,ip)",
"format": "time_series",
"hide": false,
"interval": "",
"intervalFactor": 1,
"legendFormat": "archive error",
"refId": "E"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Errors",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"decimals": null,
"format": "short",
"label": "",
"logBase": 2,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "PROMETHEUS",
"fieldConfig": {
"defaults": {
"custom": {},
"links": []
},
"overrides": []
},
"fill": 1,
"fillGradient": 1,
"gridPos": {
"h": 7,
"w": 12,
"x": 12,
"y": 7
},
"hiddenSeries": false,
"id": 10,
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"sideWidth": 150,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "7.4.5",
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "/Max:/",
"color": "#E02F44",
"nullPointMode": "null as zero"
},
{
"alias": "/Avg:/",
"color": "#8AB8FF"
}
],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max(ccp_pg_stat_statements_total_mean_exec_time_ms{pg_cluster=\"[[cluster]]\",role=\"[[role]]\"}) without (pod,instance,ip)",
"format": "time_series",
"hide": false,
"instant": false,
"interval": "",
"intervalFactor": 1,
"legendFormat": "Avg: {{exported_role}}({{dbname}})",
"refId": "A"
},
{
"expr": "max(ccp_pg_stat_statements_top_max_exec_time_ms{pg_cluster=\"[[cluster]]\",role=\"[[role]]\"}) without (pod,instance,ip,query,queryid)",
"format": "time_series",
"hide": false,
"instant": false,
"interval": "",
"intervalFactor": 1,
"legendFormat": "Max: {{exported_role}}({{dbname}})",
"refId": "B"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Query Duration",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"decimals": null,
"format": "ms",
"label": null,
"logBase": 2,
"max": null,
"min": "0",
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
}
],
"refresh": "5m",
"schemaVersion": 27,
"style": "dark",
"tags": [
"vendor=crunchydata"
],
"templating": {
"list": [
{
"allValue": null,
"current": {},
"datasource": "PROMETHEUS",
"definition": "label_values(pg_cluster)",
"description": null,
"error": null,
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "cluster",
"options": [],
"query": {
"query": "label_values(pg_cluster)",
"refId": "PROMETHEUS-cluster-Variable-Query"
},
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 0,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {},
"datasource": "PROMETHEUS",
"definition": "label_values({pg_cluster=\"[[cluster]]\"},role)",
"description": null,
"error": null,
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "role",
"options": [],
"query": {
"query": "label_values({pg_cluster=\"[[cluster]]\"},role)",
"refId": "PROMETHEUS-role-Variable-Query"
},
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 0,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "browser",
"title": "PostgreSQL Service Health",
"uid": "dhG1wgsMz",
"version": 1
}

View File

@@ -0,0 +1,961 @@
{
"__inputs": [
{
"name": "DS_PROMETHEUS",
"label": "PROMETHEUS",
"description": "",
"type": "datasource",
"pluginId": "prometheus",
"pluginName": "Prometheus"
}
],
"__requires": [
{
"type": "grafana",
"id": "grafana",
"name": "Grafana",
"version": "7.4.5"
},
{
"type": "datasource",
"id": "prometheus",
"name": "Prometheus",
"version": "1.0.0"
},
{
"type": "panel",
"id": "stat",
"name": "Stat",
"version": ""
},
{
"type": "panel",
"id": "table",
"name": "Table",
"version": ""
}
],
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"description": "Show current firing and pending alerts, and severity alert counts.",
"editable": false,
"gnetId": 4181,
"graphTooltip": 0,
"id": null,
"links": [
{
"icon": "external link",
"tags": [
"vendor=crunchydata"
],
"type": "dashboards"
}
],
"panels": [
{
"collapsed": false,
"datasource": "PROMETHEUS",
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 0
},
"id": 10,
"panels": [],
"repeat": null,
"title": "Environment Summary",
"type": "row"
},
{
"cacheTimeout": null,
"datasource": "PROMETHEUS",
"description": "",
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"custom": {},
"mappings": [
{
"id": 0,
"op": "=",
"text": "N/A",
"type": 1,
"value": "null"
}
],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "semi-dark-blue",
"value": null
}
]
},
"unit": "none"
},
"overrides": []
},
"gridPos": {
"h": 2,
"w": 4,
"x": 0,
"y": 1
},
"id": 6,
"interval": null,
"links": [],
"maxDataPoints": 100,
"options": {
"colorMode": "background",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "horizontal",
"reduceOptions": {
"calcs": [],
"fields": "",
"values": false
},
"text": {},
"textMode": "auto"
},
"pluginVersion": "7.4.5",
"targets": [
{
"expr": "count(count by (kubernetes_namespace) (pg_up))",
"format": "time_series",
"instant": true,
"interval": "",
"intervalFactor": 2,
"legendFormat": "Namespaces",
"refId": "A"
}
],
"title": "Namespaces",
"type": "stat"
},
{
"cacheTimeout": null,
"datasource": "PROMETHEUS",
"description": "",
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"custom": {},
"mappings": [
{
"id": 0,
"op": "=",
"text": "N/A",
"type": 1,
"value": "null"
}
],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "semi-dark-blue",
"value": null
}
]
},
"unit": "none"
},
"overrides": []
},
"gridPos": {
"h": 2,
"w": 4,
"x": 4,
"y": 1
},
"id": 13,
"interval": null,
"links": [],
"maxDataPoints": 100,
"options": {
"colorMode": "background",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "horizontal",
"reduceOptions": {
"calcs": [
"mean"
],
"fields": "",
"values": false
},
"text": {},
"textMode": "auto"
},
"pluginVersion": "7.4.5",
"targets": [
{
"expr": "count(count by (pg_cluster) (pg_up))",
"format": "time_series",
"instant": true,
"interval": "",
"intervalFactor": 2,
"legendFormat": "PostgreSQL Clusters",
"refId": "A"
}
],
"title": "PG Clusters",
"type": "stat"
},
{
"cacheTimeout": null,
"datasource": "PROMETHEUS",
"description": "",
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"custom": {},
"mappings": [
{
"id": 0,
"op": "=",
"text": "N/A",
"type": 1,
"value": "null"
}
],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "semi-dark-blue",
"value": null
}
]
},
"unit": "none"
},
"overrides": []
},
"gridPos": {
"h": 2,
"w": 4,
"x": 8,
"y": 1
},
"id": 14,
"interval": null,
"links": [],
"maxDataPoints": 100,
"options": {
"colorMode": "background",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "horizontal",
"reduceOptions": {
"calcs": [
"mean"
],
"fields": "",
"values": false
},
"text": {},
"textMode": "auto"
},
"pluginVersion": "7.4.5",
"targets": [
{
"expr": "count(pg_up)",
"format": "time_series",
"instant": true,
"interval": "",
"intervalFactor": 2,
"legendFormat": "PostgreSQL Clusters",
"refId": "A"
}
],
"title": "PG Instances",
"type": "stat"
},
{
"collapsed": false,
"datasource": "PROMETHEUS",
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 3
},
"id": 11,
"panels": [],
"repeat": null,
"title": "Alert Summary",
"type": "row"
},
{
"cacheTimeout": null,
"datasource": "PROMETHEUS",
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"custom": {},
"mappings": [
{
"id": 0,
"op": "=",
"text": "N/A",
"type": 1,
"value": "null"
}
],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "semi-dark-red",
"value": null
},
{
"color": "#F2495C",
"value": 1
},
{
"color": "#F2495C"
}
]
},
"unit": "none"
},
"overrides": []
},
"gridPos": {
"h": 2,
"w": 4,
"x": 0,
"y": 4
},
"id": 2,
"interval": null,
"links": [],
"maxDataPoints": 100,
"options": {
"colorMode": "background",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "horizontal",
"reduceOptions": {
"calcs": [
"mean"
],
"fields": "",
"values": false
},
"text": {},
"textMode": "auto"
},
"pluginVersion": "7.4.5",
"targets": [
{
"bucketAggs": [
{
"id": "2",
"settings": {
"interval": "auto",
"min_doc_count": 0,
"trimEdges": 0
},
"type": "date_histogram"
}
],
"dsType": "elasticsearch",
"expr": "sum(ALERTS{alertstate=\"firing\",severity=\"critical\"} > 0) OR on() vector(0)",
"format": "time_series",
"instant": true,
"interval": "",
"intervalFactor": 1,
"legendFormat": "Critical",
"metrics": [
{
"field": "select field",
"id": "1",
"type": "count"
}
],
"refId": "A"
}
],
"title": "Critical",
"type": "stat"
},
{
"cacheTimeout": null,
"datasource": "PROMETHEUS",
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"custom": {},
"mappings": [
{
"id": 0,
"op": "=",
"text": "N/A",
"type": 1,
"value": "null"
}
],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "semi-dark-orange",
"value": null
}
]
},
"unit": "none"
},
"overrides": []
},
"gridPos": {
"h": 2,
"w": 4,
"x": 4,
"y": 4
},
"id": 5,
"interval": null,
"links": [],
"maxDataPoints": 100,
"options": {
"colorMode": "background",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "horizontal",
"reduceOptions": {
"calcs": [],
"fields": "",
"values": false
},
"text": {},
"textMode": "auto"
},
"pluginVersion": "7.4.5",
"targets": [
{
"expr": "sum(ALERTS{alertstate=\"firing\",severity=\"warning\"} > 0) OR on() vector(0)",
"format": "time_series",
"instant": true,
"interval": "",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"title": "Warning",
"type": "stat"
},
{
"cacheTimeout": null,
"datasource": "PROMETHEUS",
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"custom": {},
"mappings": [
{
"id": 0,
"op": "=",
"text": "N/A",
"type": 1,
"value": "null"
}
],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "#299c46",
"value": null
}
]
},
"unit": "none"
},
"overrides": []
},
"gridPos": {
"h": 2,
"w": 4,
"x": 8,
"y": 4
},
"id": 9,
"interval": null,
"links": [],
"maxDataPoints": 100,
"options": {
"colorMode": "background",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "horizontal",
"reduceOptions": {
"calcs": [
"mean"
],
"fields": "",
"values": false
},
"text": {},
"textMode": "auto"
},
"pluginVersion": "7.4.5",
"targets": [
{
"expr": "sum(ALERTS{alertstate=\"firing\",severity=\"info\"} > 0) OR on() vector(0)",
"format": "time_series",
"interval": "",
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"title": "Info",
"type": "stat"
},
{
"collapsed": false,
"datasource": "PROMETHEUS",
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 6
},
"id": 12,
"panels": [],
"repeat": null,
"title": "Alerts",
"type": "row"
},
{
"datasource": "PROMETHEUS",
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"custom": {
"align": null,
"displayMode": "auto",
"filterable": true
},
"decimals": 2,
"displayName": "",
"mappings": [
{
"from": "",
"id": 1,
"text": "",
"to": "",
"type": 1,
"value": ""
}
],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "blue",
"value": 100
},
{
"color": "#EAB839",
"value": 200
},
{
"color": "red",
"value": 300
}
]
},
"unit": "short"
},
"overrides": [
{
"matcher": {
"id": "byName",
"options": "severity_num"
},
"properties": [
{
"id": "custom.displayMode",
"value": "color-background"
},
{
"id": "custom.width",
"value": 124
}
]
},
{
"matcher": {
"id": "byName",
"options": "Time"
},
"properties": [
{
"id": "custom.width",
"value": 170
}
]
},
{
"matcher": {
"id": "byName",
"options": "severity"
},
"properties": [
{
"id": "custom.width",
"value": 119
}
]
},
{
"matcher": {
"id": "byName",
"options": "alertname"
},
"properties": [
{
"id": "custom.width",
"value": 206
}
]
},
{
"matcher": {
"id": "byName",
"options": "alertstate"
},
"properties": [
{
"id": "custom.width",
"value": 128
}
]
}
]
},
"gridPos": {
"h": 5,
"w": 24,
"x": 0,
"y": 7
},
"id": 1,
"links": [],
"options": {
"showHeader": true,
"sortBy": []
},
"pluginVersion": "7.4.5",
"targets": [
{
"expr": "ALERTS{alertstate='firing'} > 0",
"format": "table",
"instant": true,
"interval": "2s",
"intervalFactor": 1,
"legendFormat": "",
"refId": "A"
}
],
"title": "Firing",
"transformations": [
{
"id": "merge",
"options": {
"reducers": []
}
},
{
"id": "organize",
"options": {
"excludeByName": {
"Value": true,
"__name__": true,
"alertstate": false,
"deployment": false,
"exp_type": true,
"fs_type": true,
"instance": true,
"job": true,
"kubernetes_namespace": true,
"mount_point": true,
"server": true,
"service": true,
"severity_num": false
},
"indexByName": {
"Time": 0,
"Value": 16,
"__name__": 3,
"alertname": 4,
"alertstate": 5,
"deployment": 7,
"exp_type": 9,
"instance": 10,
"ip": 11,
"job": 12,
"kubernetes_namespace": 13,
"pg_cluster": 6,
"pod": 8,
"role": 14,
"service": 15,
"severity": 2,
"severity_num": 1
},
"renameByName": {
"Time": "",
"__name__": "",
"severity": "",
"severity_num": ""
}
}
}
],
"type": "table"
},
{
"datasource": "PROMETHEUS",
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"custom": {
"align": null,
"filterable": true
},
"decimals": 2,
"displayName": "",
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "short"
},
"overrides": [
{
"matcher": {
"id": "byRegexp",
"options": "/(instance|__name__|Time|alertstate|job|type|Value)/"
},
"properties": [
{
"id": "unit",
"value": "short"
},
{
"id": "decimals",
"value": 2
},
{
"id": "custom.align",
"value": null
}
]
},
{
"matcher": {
"id": "byName",
"options": "Time"
},
"properties": [
{
"id": "custom.width",
"value": null
}
]
},
{
"matcher": {
"id": "byName",
"options": "severity_num"
},
"properties": [
{
"id": "custom.width",
"value": 126
}
]
},
{
"matcher": {
"id": "byName",
"options": "severity"
},
"properties": [
{
"id": "custom.width",
"value": 115
}
]
},
{
"matcher": {
"id": "byName",
"options": "alertname"
},
"properties": [
{
"id": "custom.width",
"value": 207
}
]
},
{
"matcher": {
"id": "byName",
"options": "alertstate"
},
"properties": [
{
"id": "custom.width",
"value": 131
}
]
}
]
},
"gridPos": {
"h": 7,
"w": 24,
"x": 0,
"y": 12
},
"id": 3,
"links": [],
"options": {
"showHeader": true,
"sortBy": []
},
"pluginVersion": "7.4.5",
"targets": [
{
"expr": "ALERTS{alertstate=\"pending\"}",
"format": "table",
"instant": false,
"interval": "",
"intervalFactor": 1,
"legendFormat": "",
"refId": "A"
}
],
"title": "Alerts (1 week)",
"transformations": [
{
"id": "organize",
"options": {
"excludeByName": {
"Value": true,
"__name__": true,
"exp_type": true,
"instance": true,
"job": true,
"kubernetes_namespace": true,
"service": true
},
"indexByName": {
"Time": 0,
"Value": 16,
"__name__": 3,
"alertname": 4,
"alertstate": 5,
"deployment": 7,
"exp_type": 8,
"instance": 9,
"ip": 11,
"job": 12,
"kubernetes_namespace": 13,
"pg_cluster": 6,
"pod": 10,
"role": 14,
"service": 15,
"severity": 2,
"severity_num": 1
},
"renameByName": {}
}
}
],
"type": "table"
}
],
"refresh": "15m",
"schemaVersion": 27,
"style": "dark",
"tags": [
"vendor=crunchydata"
],
"templating": {
"list": []
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "browser",
"title": "Prometheus Alerts",
"uid": "lwxXsZsMk",
"version": 1
}

View File

@@ -0,0 +1,64 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: crunchy-grafana
spec:
selector: {}
template:
spec:
containers:
- name: grafana
image: grafana/grafana:9.2.20
ports:
- containerPort: 3000
env:
- name: GF_PATHS_DATA
value: /data/grafana/data
- name: GF_SECURITY_ADMIN_USER__FILE
value: /conf/admin/username
- name: GF_SECURITY_ADMIN_PASSWORD__FILE
value: /conf/admin/password
- name: PROM_HOST
value: crunchy-prometheus
- name: PROM_PORT
value: "9090"
livenessProbe:
httpGet:
path: /api/health
port: 3000
initialDelaySeconds: 25
periodSeconds: 20
readinessProbe:
httpGet:
path: /api/health
port: 3000
volumeMounts:
- mountPath: /data
name: grafanadata
- mountPath: /conf/admin
name: grafana-admin
- mountPath: /etc/grafana/provisioning/datasources
name: grafana-datasources
- mountPath: /etc/grafana/provisioning/dashboards
name: grafana-dashboards
securityContext:
fsGroup: 26
# supplementalGroups:
# - 65534
serviceAccountName: grafana
volumes:
- name: grafanadata
persistentVolumeClaim:
claimName: grafanadata
- name: grafana-admin
secret:
defaultMode: 420
secretName: grafana-admin
- name: grafana-datasources
configMap:
defaultMode: 420
name: grafana-datasources
- name: grafana-dashboards
configMap:
defaultMode: 420
name: grafana-dashboards

View File

@@ -0,0 +1,33 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
labels:
- includeSelectors: true
pairs:
app.kubernetes.io/component: crunchy-grafana
resources:
- deployment.yaml
- pvc.yaml
- service.yaml
- serviceaccount.yaml
- dashboards
configMapGenerator:
- name: grafana-datasources
files:
- config/crunchy_grafana_datasource.yml
- name: grafana-dashboards
behavior: merge
files:
- config/crunchy_grafana_dashboards.yml
secretGenerator:
- name: grafana-admin
literals:
- password=admin
- username=admin
type: Opaque
generatorOptions:
disableNameSuffixHash: true

View File

@@ -0,0 +1,10 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafanadata
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi

View File

@@ -0,0 +1,9 @@
apiVersion: v1
kind: Service
metadata:
name: crunchy-grafana
spec:
type: ClusterIP
ports:
- name: grafana
port: 3000

View File

@@ -0,0 +1,4 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: grafana

View File

@@ -0,0 +1,14 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: prod-iam
labels:
- includeSelectors: true
pairs:
app.kubernetes.io/name: prod-iam-obs
vendor: holos
resources:
- grafana
- prometheus
- alertmanager

View File

@@ -0,0 +1,9 @@
package holos
spec: components: KustomizeBuildList: [
#KustomizeBuild & {
_dependsOn: "prod-secrets-stores": _
metadata: name: "prod-iam-obs"
},
]

View File

@@ -0,0 +1,13 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- resources:
- pods
apiGroups:
- ""
verbs:
- get
- list
- watch

View File

@@ -0,0 +1,11 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus

View File

@@ -0,0 +1,418 @@
###
#
# Copyright © 2017-2024 Crunchy Data Solutions, Inc. All Rights Reserved.
#
###
groups:
- name: alert-rules
rules:
########## EXPORTER RULES ##########
- alert: PGExporterScrapeError
expr: pg_exporter_last_scrape_error > 0
for: 60s
labels:
service: postgresql
severity: critical
severity_num: 300
annotations:
summary: 'Postgres Exporter running on {{ $labels.job }} (instance: {{ $labels.instance }}) is encountering scrape errors processing queries. Error count: ( {{ $value }} )'
########## SYSTEM RULES ##########
- alert: ExporterDown
expr: avg_over_time(up[5m]) < 0.5
for: 10s
labels:
service: system
severity: critical
severity_num: 300
annotations:
description: 'Metrics exporter service for {{ $labels.job }} running on {{ $labels.instance }} has been down at least 50% of the time for the last 5 minutes. Service may be flapping or down.'
summary: 'Prometheus Exporter Service Down'
########## POSTGRESQL RULES ##########
- alert: PGIsUp
expr: pg_up < 1
for: 60s
labels:
service: postgresql
severity: critical
severity_num: 300
annotations:
summary: 'postgres_exporter running on {{ $labels.job }} is unable to communicate with the configured database'
# Example to check for current version of PostgreSQL. Metric returns the version that the exporter is running on, so you can set a rule to check for the minimum version you'd like all systems to be on. Number returned is the 6 digit integer representation contained in the setting "server_version_num".
#
# - alert: PGMinimumVersion
# expr: ccp_postgresql_version_current < 110005
# for: 60s
# labels:
# service: postgresql
# severity: critical
# severity_num: 300
# annotations:
# summary: '{{ $labels.job }} is not running at least version 11.5 of PostgreSQL'
# Whether a system switches from primary to replica or vice versa must be configured per named job.
# No way to tell what value a system is supposed to be without a rule expression for that specific system
# 2 to 1 means it changed from primary to replica. 1 to 2 means it changed from replica to primary
# Set this alert for each system that you want to monitor a recovery status change
# Below is an example for a target job called "Replica" and watches for the value to change above 1 which means it's no longer a replica
#
# - alert: PGRecoveryStatusSwitch_Replica
# expr: ccp_is_in_recovery_status{job="Replica"} > 1
# for: 60s
# labels:
# service: postgresql
# severity: critical
# severity_num: 300
# annotations:
# summary: '{{ $labels.job }} has changed from replica to primary'
# Absence alerts must be configured per named job, otherwise there's no way to know which job is down
# Below is an example for a target job called "Prod"
# - alert: PGConnectionAbsent_Prod
# expr: absent(ccp_connection_stats_max_connections{job="Prod"})
# for: 10s
# labels:
# service: postgresql
# severity: critical
# severity_num: 300
# annotations:
# description: 'Connection metric is absent from target (Prod). Check that postgres_exporter can connect to PostgreSQL.'
# Optional monitor for changes to pg_settings (postgresql.conf) system catalog.
# A similar metric is available for monitoring pg_hba.conf. See ccp_hba_settings_checksum().
# If metric returns 0, then NO settings have changed for either pg_settings since last known valid state
# If metric returns 1, then pg_settings have changed since last known valid state
# To see what may have changed, check the monitor.pg_settings_checksum table for a history of config state.
# - alert: PGSettingsChecksum
# expr: ccp_pg_settings_checksum > 0
# for 60s
# labels:
# service: postgresql
# severity: critical
# severity_num: 300
# annotations:
# description: 'Configuration settings on {{ $labels.job }} have changed from previously known valid state. To reset current config to a valid state after alert fires, run monitor.pg_settings_checksum_set_valid().'
# summary: 'PGSQL Instance settings checksum'
# Monitor for data block checksum failures. Only works in PG12+
# - alert: PGDataChecksum
# expr: ccp_data_checksum_failure > 0
# for 60s
# labels:
# service: postgresql
# severity: critical
# severity_num: 300
# annotations:
# description: '{{ $labels.job }} has at least one data checksum failure in database {{ $labels.dbname }}. See pg_stat_database system catalog for more information.'
# summary: 'PGSQL Data Checksum failure'
- alert: PGIdleTxn
expr: ccp_connection_stats_max_idle_in_txn_time > 300
for: 60s
labels:
service: postgresql
severity: warning
severity_num: 200
annotations:
description: '{{ $labels.job }} has at least one session idle in transaction for over 5 minutes.'
summary: 'PGSQL Instance idle transactions'
- alert: PGIdleTxn
expr: ccp_connection_stats_max_idle_in_txn_time > 900
for: 60s
labels:
service: postgresql
severity: critical
severity_num: 300
annotations:
description: '{{ $labels.job }} has at least one session idle in transaction for over 15 minutes.'
summary: 'PGSQL Instance idle transactions'
- alert: PGQueryTime
expr: ccp_connection_stats_max_query_time > 43200
for: 60s
labels:
service: postgresql
severity: warning
severity_num: 200
annotations:
description: '{{ $labels.job }} has at least one query running for over 12 hours.'
summary: 'PGSQL Max Query Runtime'
- alert: PGQueryTime
expr: ccp_connection_stats_max_query_time > 86400
for: 60s
labels:
service: postgresql
severity: critical
severity_num: 300
annotations:
description: '{{ $labels.job }} has at least one query running for over 1 day.'
summary: 'PGSQL Max Query Runtime'
- alert: PGConnPerc
expr: 100 * (ccp_connection_stats_total / ccp_connection_stats_max_connections) > 75
for: 60s
labels:
service: postgresql
severity: warning
severity_num: 200
annotations:
description: '{{ $labels.job }} is using 75% or more of available connections ({{ $value }}%)'
summary: 'PGSQL Instance connections'
- alert: PGConnPerc
expr: 100 * (ccp_connection_stats_total / ccp_connection_stats_max_connections) > 90
for: 60s
labels:
service: postgresql
severity: critical
severity_num: 300
annotations:
description: '{{ $labels.job }} is using 90% or more of available connections ({{ $value }}%)'
summary: 'PGSQL Instance connections'
- alert: DiskFillPredict
expr: predict_linear(ccp_nodemx_data_disk_available_bytes{mount_point!~"tmpfs"}[1h], 24 * 3600) < 0 and 100 * ((ccp_nodemx_data_disk_total_bytes - ccp_nodemx_data_disk_available_bytes) / ccp_nodemx_data_disk_total_bytes) > 70
for: 5m
labels:
service: postgresql
severity: warning
severity_num: 200
annotations:
summary: 'Disk predicted to be full in 24 hours'
description: 'Disk on {{ $labels.pg_cluster }}:{{ $labels.kubernetes_pod_name }} is predicted to fill in 24 hrs based on current usage'
- alert: PGClusterRoleChange
expr: count by (pg_cluster) (ccp_is_in_recovery_status != ignoring(instance,ip,pod,role) (ccp_is_in_recovery_status offset 5m)) >= 1
for: 60s
labels:
service: postgresql
severity: critical
severity_num: 300
annotations:
summary: '{{ $labels.pg_cluster }} has had a switchover/failover event. Please check this cluster for more details'
- alert: PGDiskSize
expr: 100 * ((ccp_nodemx_data_disk_total_bytes - ccp_nodemx_data_disk_available_bytes) / ccp_nodemx_data_disk_total_bytes) > 75
for: 60s
labels:
service: postgresql
severity: warning
severity_num: 200
annotations:
description: 'PGSQL Instance {{ $labels.deployment }} over 75% disk usage at mount point "{{ $labels.mount_point }}": {{ $value }}%'
summary: PGSQL Instance usage warning
- alert: PGDiskSize
expr: 100 * ((ccp_nodemx_data_disk_total_bytes - ccp_nodemx_data_disk_available_bytes) / ccp_nodemx_data_disk_total_bytes) > 90
for: 60s
labels:
service: postgresql
severity: critical
severity_num: 300
annotations:
description: 'PGSQL Instance {{ $labels.deployment }} over 90% disk usage at mount point "{{ $labels.mount_point }}": {{ $value }}%'
summary: 'PGSQL Instance size critical'
- alert: PGReplicationByteLag
expr: ccp_replication_lag_size_bytes > 5.24288e+07
for: 60s
labels:
service: postgresql
severity: warning
severity_num: 200
annotations:
description: 'PGSQL Instance {{ $labels.job }} has at least one replica lagging over 50MB behind.'
summary: 'PGSQL Instance replica lag warning'
- alert: PGReplicationByteLag
expr: ccp_replication_lag_size_bytes > 1.048576e+08
for: 60s
labels:
service: postgresql
severity: critical
severity_num: 300
annotations:
description: 'PGSQL Instance {{ $labels.job }} has at least one replica lagging over 100MB behind.'
summary: 'PGSQL Instance replica lag warning'
- alert: PGReplicationSlotsInactive
expr: ccp_replication_slots_active == 0
for: 60s
labels:
service: postgresql
severity: critical
severity_num: 300
annotations:
description: 'PGSQL Instance {{ $labels.job }} has one or more inactive replication slots'
summary: 'PGSQL Instance inactive replication slot'
- alert: PGXIDWraparound
expr: ccp_transaction_wraparound_percent_towards_wraparound > 50
for: 60s
labels:
service: postgresql
severity: warning
severity_num: 200
annotations:
description: 'PGSQL Instance {{ $labels.job }} is over 50% towards transaction id wraparound.'
summary: 'PGSQL Instance {{ $labels.job }} transaction id wraparound imminent'
- alert: PGXIDWraparound
expr: ccp_transaction_wraparound_percent_towards_wraparound > 75
for: 60s
labels:
service: postgresql
severity: critical
severity_num: 300
annotations:
description: 'PGSQL Instance {{ $labels.job }} is over 75% towards transaction id wraparound.'
summary: 'PGSQL Instance transaction id wraparound imminent'
- alert: PGEmergencyVacuum
expr: ccp_transaction_wraparound_percent_towards_emergency_autovac > 110
for: 60s
labels:
service: postgresql
severity: warning
severity_num: 200
annotations:
description: 'PGSQL Instance {{ $labels.job }} is over 110% beyond autovacuum_freeze_max_age value. Autovacuum may need tuning to better keep up.'
summary: 'PGSQL Instance emergency vacuum imminent'
- alert: PGEmergencyVacuum
expr: ccp_transaction_wraparound_percent_towards_emergency_autovac > 125
for: 60s
labels:
service: postgresql
severity: critical
severity_num: 300
annotations:
description: 'PGSQL Instance {{ $labels.job }} is over 125% beyond autovacuum_freeze_max_age value. Autovacuum needs tuning to better keep up.'
summary: 'PGSQL Instance emergency vacuum imminent'
- alert: PGArchiveCommandStatus
expr: ccp_archive_command_status_seconds_since_last_fail > 300
for: 60s
labels:
service: postgresql
severity: critical
severity_num: 300
annotations:
description: 'PGSQL Instance {{ $labels.job }} has a recent failing archive command'
summary: 'Seconds since the last recorded failure of the archive_command'
- alert: PGSequenceExhaustion
expr: ccp_sequence_exhaustion_count > 0
for: 60s
labels:
service: postgresql
severity: critical
severity_num: 300
annotations:
description: 'Count of sequences on instance {{ $labels.job }} at over 75% usage: {{ $value }}. Run following query to see full sequence status: SELECT * FROM monitor.sequence_status() WHERE percent >= 75'
- alert: PGSettingsPendingRestart
expr: ccp_settings_pending_restart_count > 0
for: 60s
labels:
service: postgresql
severity: critical
severity_num: 300
annotations:
description: 'One or more settings in the pg_settings system catalog on system {{ $labels.job }} are in a pending_restart state. Check the system catalog for which settings are pending and review postgresql.conf for changes.'
########## PGBACKREST RULES ##########
#
# Uncomment and customize one or more of these rules to monitor your pgbackrest backups.
# Full backups are considered the equivalent of both differentials and incrementals since both are based on the last full
# And differentials are considered incrementals since incrementals will be based off the last diff if one exists
# This avoid false alerts, for example when you don't run diff/incr backups on the days that you run a full
# Stanza should also be set if different intervals are expected for each stanza.
# Otherwise rule will be applied to all stanzas returned on target system if not set.
#
# Relevant metric names are:
# ccp_backrest_last_full_backup_time_since_completion_seconds
# ccp_backrest_last_incr_backup_time_since_completion_seconds
# ccp_backrest_last_diff_backup_time_since_completion_seconds
#
# To avoid false positives on backup time alerts, 12 hours are added onto each threshold to allow a buffer if the backup runtime varies from day to day.
# Further adjustment may be needed depending on your backup runtimes/schedule.
#
# - alert: PGBackRestLastCompletedFull_main
# expr: ccp_backrest_last_full_backup_time_since_completion_seconds{stanza="main"} > 648000
# for: 60s
# labels:
# service: postgresql
# severity: critical
# severity_num: 300
# annotations:
# summary: 'Full backup for stanza [main] on system {{ $labels.job }} has not completed in the last week.'
#
# - alert: PGBackRestLastCompletedIncr_main
# expr: ccp_backrest_last_incr_backup_time_since_completion_seconds{stanza="main"} > 129600
# for: 60s
# labels:
# service: postgresql
# severity: critical
# severity_num: 300
# annotations:
# summary: 'Incremental backup for stanza [main] on system {{ $labels.job }} has not completed in the last 24 hours.'
#
#
# Runtime monitoring is handled with a single metric:
#
# ccp_backrest_last_info_backup_runtime_seconds
#
# Runtime monitoring should have the "backup_type" label set.
# Otherwise the rule will apply to the last run of all backup types returned (full, diff, incr)
# Stanza should also be set if runtimes per stanza have different expected times
#
# - alert: PGBackRestLastRuntimeFull_main
# expr: ccp_backrest_last_info_backup_runtime_seconds{backup_type="full", stanza="main"} > 14400
# for: 60s
# labels:
# service: postgresql
# severity: critical
# severity_num: 300
# annotations:
# summary: 'Expected runtime of full backup for stanza [main] has exceeded 4 hours'
#
# - alert: PGBackRestLastRuntimeDiff_main
# expr: ccp_backrest_last_info_backup_runtime_seconds{backup_type="diff", stanza="main"} > 3600
# for: 60s
# labels:
# service: postgresql
# severity: critical
# severity_num: 300
# annotations:
# summary: 'Expected runtime of diff backup for stanza [main] has exceeded 1 hour'
##
#
## If the pgbackrest command fails to run, the metric disappears from the exporter output and the alert never fires.
## An absence alert must be configured explicitly for each target (job) that backups are being monitored.
## Checking for absence of just the full backup type should be sufficient (no need for diff/incr).
## Note that while the backrest check command failing will likely also cause a scrape error alert, the addition of this
## check gives a clearer answer as to what is causing it and that something is wrong with the backups.
#
# - alert: PGBackrestAbsentFull_Prod
# expr: absent(ccp_backrest_last_full_backup_time_since_completion_seconds{job="Prod"})
# for: 10s
# labels:
# service: postgresql
# severity: critical
# severity_num: 300
# annotations:
# description: 'Backup Full status missing for Prod. Check that pgbackrest info command is working on target system.'

View File

@@ -0,0 +1,85 @@
###
#
# Copyright © 2017-2024 Crunchy Data Solutions, Inc. All Rights Reserved.
#
###
---
global:
scrape_interval: 15s
scrape_timeout: 15s
evaluation_interval: 5s
scrape_configs:
- job_name: 'crunchy-postgres-exporter'
kubernetes_sd_configs:
- role: pod
selectors:
- role: pod
label: postgres-operator.crunchydata.com/crunchy-postgres-exporter=true
relabel_configs:
# Keep exporter port and drop all others
- source_labels: [__meta_kubernetes_pod_container_port_number]
action: keep
regex: 9187
# Set label for namespace
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
# Set label for pod name
- source_labels: [__meta_kubernetes_pod_name]
target_label: pod
# Convert namespace and cluster name to pg_cluster=namespace:cluster
- source_labels: [__meta_kubernetes_namespace,__meta_kubernetes_pod_label_postgres_operator_crunchydata_com_cluster]
target_label: pg_cluster
separator: ":"
replacement: '$1$2'
# Convert kubernetes pod ip to ip
- source_labels: [__meta_kubernetes_pod_ip]
target_label: ip
# Convert postgres-operator.crunchydata.com/instance to deployment
- source_labels: [__meta_kubernetes_pod_label_postgres_operator_crunchydata_com_instance]
target_label: deployment
# Convert postgres-operator.crunchydata.com/role to role
- source_labels: [__meta_kubernetes_pod_label_postgres_operator_crunchydata_com_role]
target_label: role
- job_name: 'crunchy-postgres-exporter-v4'
kubernetes_sd_configs:
- role: pod
selectors:
- role: pod
label: crunchy-postgres-exporter=true
relabel_configs:
# Keep exporter port and drop all others
- source_labels: [__meta_kubernetes_pod_container_port_number]
action: keep
regex: 9187
# Set label for namespace
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
# Set label for pod name
- source_labels: [__meta_kubernetes_pod_name]
target_label: pod
# Convert namespace and cluster name to pg_cluster=namespace:cluster
- source_labels: [__meta_kubernetes_namespace,__meta_kubernetes_pod_label_pg_cluster]
target_label: pg_cluster
separator: ":"
replacement: '$1$2'
# Convert kubernetes pod ip to ip
- source_labels: [__meta_kubernetes_pod_ip]
target_label: ip
# Set deployment_name as deployment label
- source_labels: [__meta_kubernetes_pod_label_deployment_name]
target_label: deployment
# Set label for role
- source_labels: [__meta_kubernetes_pod_label_role]
target_label: role
rule_files:
- /etc/prometheus/alert-rules.d/*.yml
alerting:
alertmanagers:
- kubernetes_sd_configs:
- role: pod
selectors:
- role: pod
label: app.kubernetes.io/component=crunchy-alertmanager

View File

@@ -0,0 +1,47 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: crunchy-prometheus
spec:
selector: {}
template:
spec:
containers:
- name: prometheus
image: prom/prometheus:v2.39.2
ports:
- containerPort: 9090
livenessProbe:
httpGet:
path: /-/healthy
port: 9090
initialDelaySeconds: 15
periodSeconds: 20
readinessProbe:
httpGet:
path: /-/ready
port: 9090
volumeMounts:
- mountPath: /etc/prometheus
name: prometheusconf
- mountPath: /prometheus
name: prometheusdata
- mountPath: /etc/prometheus/alert-rules.d
name: alertmanagerrules
securityContext:
fsGroup: 26
# supplementalGroups:
# - 65534
serviceAccountName: prometheus
volumes:
- name: prometheusconf
configMap:
defaultMode: 420
name: crunchy-prometheus
- name: prometheusdata
persistentVolumeClaim:
claimName: prometheusdata
- name: alertmanagerrules
configMap:
defaultMode: 420
name: alert-rules-config

View File

@@ -0,0 +1,26 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
labels:
- includeSelectors: true
pairs:
app.kubernetes.io/component: crunchy-prometheus
resources:
- deployment.yaml
- pvc.yaml
- service.yaml
- serviceaccount.yaml
- clusterrole.yaml
- clusterrolebinding.yaml
configMapGenerator:
- name: crunchy-prometheus
files:
- config/prometheus.yml
- name: alert-rules-config
files:
- config/crunchy-alert-rules-pg.yml
generatorOptions:
disableNameSuffixHash: true

View File

@@ -0,0 +1,10 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheusdata
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi

View File

@@ -0,0 +1,9 @@
apiVersion: v1
kind: Service
metadata:
name: crunchy-prometheus
spec:
type: ClusterIP
ports:
- name: prometheus
port: 9090

View File

@@ -0,0 +1,4 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus

View File

@@ -20,7 +20,17 @@ let SecretNames = {
},
]
#KubernetesObjects & {
spec: components: KubernetesObjectsList: [
#KubernetesObjects & {
metadata: name: "prod-iam-postgres-certs"
_dependsOn: "prod-secrets-stores": _
apiObjectMap: OBJECTS.apiObjectMap
},
]
let OBJECTS = #APIObjects & {
apiObjects: {
for s in SecretNames {
ExternalSecret: "\(s.name)": _

View File

@@ -33,7 +33,17 @@ let RestoreOptions = []
},
]
#KubernetesObjects & {
spec: components: KubernetesObjectsList: [
#KubernetesObjects & {
metadata: name: "prod-iam-postgres"
_dependsOn: "prod-secrets-stores": _
_dependsOn: "prod-iam-postgres-certs": _
apiObjectMap: OBJECTS.apiObjectMap
},
]
let OBJECTS = #APIObjects & {
apiObjects: {
ExternalSecret: "\(S3Secret)": _
PostgresCluster: db: #PostgresCluster & HighlyAvailable & {
@@ -58,7 +68,7 @@ let RestoreOptions = []
replicas: 2
dataVolumeClaimSpec: {
accessModes: ["ReadWriteOnce"]
resources: requests: storage: "10Gi"
resources: requests: storage: "20Gi"
}
}]
standby: {
@@ -70,6 +80,8 @@ let RestoreOptions = []
enabled: true
}
}
// Monitoring configuration
monitoring: pgmonitor: exporter: image: "registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-5.5.1-0"
// Restore from backup if and only if the cluster is primary
if Cluster.primary {
dataSource: pgbackrest: {
@@ -114,7 +126,7 @@ let RestoreOptions = []
"\(BucketRepoName)-cipher-type": "aes-256-cbc"
// "The convention we recommend for setting this variable is /pgbackrest/$NAMESPACE/$CLUSTER_NAME/repoN"
// Ref: https://access.crunchydata.com/documentation/postgres-operator/latest/tutorials/backups-disaster-recovery/backups#understanding-backup-configuration-and-basic-operations
"\(BucketRepoName)-path": "/pgbackrest/\(#TargetNamespace)/\(metadata.name)/\(manual.repoName)"
"\(BucketRepoName)-path": "/pgbackrest/\(metadata.namespace)/\(metadata.name)/\(manual.repoName)"
}
repos: [
{
@@ -155,7 +167,7 @@ let HighlyAvailable = {
replicas: 2
dataVolumeClaimSpec: {
accessModes: ["ReadWriteOnce"]
resources: requests: storage: string | *"10Gi"
resources: requests: storage: string | *"20Gi"
}
affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: [{
weight: 1

View File

@@ -1,6 +1,9 @@
package holos
#TargetNamespace: #InstancePrefix + "-zitadel"
#InstancePrefix: "prod-iam"
// The namespace is managed by a project.
#TargetNamespace: _Projects.iam.environments.prod.namespace
// _DBName is the database name used across multiple holos components in this project
_DBName: "zitadel"

View File

@@ -74,7 +74,7 @@ package holos
repository: "ghcr.io/zitadel/zitadel"
pullPolicy: "IfNotPresent"
// Overrides the image tag whose default is the chart appVersion.
tag: ""
tag: string | *""
}
chownImage: {

View File

@@ -1,6 +1,10 @@
package holos
#Values: {
// https://github.com/zitadel/zitadel/releases
// Overrides the image tag whose default is the chart appVersion.
image: tag: "v2.49.1"
// Database credentials
// Refer to https://access.crunchydata.com/documentation/postgres-operator/5.2.0/architecture/user-management/
// Refer to https://zitadel.com/docs/self-hosting/manage/database#postgres
@@ -74,6 +78,12 @@ package holos
ExternalPort: 443
TLS: Enabled: false
// Fix AuthProxy JWKS Error - Jwks doesn't have key to match kid or alg from Jwt
// Refer to: https://github.com/holos-run/holos/issues/96
// Refer to: https://github.com/zitadel/zitadel/discussions/7464
SystemDefaults: KeyConfig: PrivateKeyLifetime: "999999h"
SystemDefaults: KeyConfig: PublicKeyLifetime: "999999h"
// Database connection credentials are injected via environment variables from the db-pguser-db secret.
Database: postgres: {
MaxOpenConns: 25

View File

@@ -4,50 +4,30 @@ import "encoding/yaml"
let Name = "zitadel"
#InputKeys: component: Name
#DependsOn: postgres: _
// Upstream helm chart doesn't specify the namespace field for all resources.
#Kustomization: spec: {
targetNamespace: #TargetNamespace
wait: false
}
spec: components: HelmChartList: [
#HelmChart & {
metadata: name: "\(#InstancePrefix)-zitadel"
if #IsPrimaryCluster == true {
#Kustomization: spec: healthChecks: [
{
apiVersion: "apps/v1"
kind: "Deployment"
name: Name
namespace: #TargetNamespace
},
{
apiVersion: "batch/v1"
kind: "Job"
name: "\(Name)-init"
namespace: #TargetNamespace
},
{
apiVersion: "batch/v1"
kind: "Job"
name: "\(Name)-setup"
namespace: #TargetNamespace
},
]
}
_dependsOn: "prod-secrets-stores": _
_dependsOn: "\(#InstancePrefix)-postgres": _
#HelmChart & {
namespace: #TargetNamespace
enableHooks: true
chart: {
name: Name
version: "7.9.0"
repository: {
name: Name
url: "https://charts.zitadel.com"
namespace: #TargetNamespace
enableHooks: true
chart: {
name: Name
version: "7.11.0"
repository: {
name: Name
url: "https://charts.zitadel.com"
}
}
}
values: #Values
_values: #Values
apiObjectMap: OBJECTS.apiObjectMap
},
]
let OBJECTS = #APIObjects & {
apiObjects: {
ExternalSecret: "zitadel-masterkey": _
VirtualService: "\(Name)": {
@@ -97,8 +77,7 @@ let CAPatch = #Patch & {
patch: yaml.Marshal(DatabaseCACertPatch)
}
// TODO: Replace with #Kustomize & { _patches: foo: {} }
#KustomizePatches: {
#Kustomize: _patches: {
mesh: {
target: {
group: "apps"
@@ -163,3 +142,32 @@ let CAPatch = #Patch & {
}
let DisableFluxPatch = [{op: "replace", path: "/metadata/annotations/kustomize.toolkit.fluxcd.io~1reconcile", value: "disabled"}]
// Upstream helm chart doesn't specify the namespace field for all resources.
#Kustomization: spec: {
targetNamespace: #TargetNamespace
wait: false
}
if #IsPrimaryCluster == true {
#Kustomization: spec: healthChecks: [
{
apiVersion: "apps/v1"
kind: "Deployment"
name: Name
namespace: #TargetNamespace
},
{
apiVersion: "batch/v1"
kind: "Job"
name: "\(Name)-init"
namespace: #TargetNamespace
},
{
apiVersion: "batch/v1"
kind: "Job"
name: "\(Name)-setup"
namespace: #TargetNamespace
},
]
}

View File

@@ -0,0 +1,84 @@
package holos
import "encoding/yaml"
let ArgoCD = "argocd"
let Namespace = "prod-platform"
spec: components: HelmChartList: [
#HelmChart & {
_dependsOn: "prod-secrets-stores": _
namespace: Namespace
metadata: name: "\(namespace)-\(ArgoCD)"
chart: {
name: "argo-cd"
release: "argocd"
version: "6.7.8"
repository: {
name: "argocd"
url: "https://argoproj.github.io/argo-helm"
}
}
_values: #ArgoCDValues & {
kubeVersionOverride: "1.29.0"
global: domain: "argocd.\(#ClusterName).\(#Platform.org.domain)"
dex: enabled: false
// for integration with istio
configs: params: "server.insecure": true
configs: cm: {
"admin.enabled": false
"oidc.config": yaml.Marshal(OIDCConfig)
}
}
// Holos overlay objects
apiObjectMap: OBJECTS.apiObjectMap
},
]
let OBJECTS = #APIObjects & {
apiObjects: {
// ExternalSecret: "deploy-key": _
VirtualService: (ArgoCD): {
metadata: name: ArgoCD
metadata: namespace: Namespace
spec: hosts: [
ArgoCD + ".\(#Platform.org.domain)",
ArgoCD + ".\(#ClusterName).\(#Platform.org.domain)",
]
spec: gateways: ["istio-ingress/default"]
spec: http: [{route: [{destination: {
host: "argocd-server.\(Namespace).svc.cluster.local"
port: number: 80
}}]}]
}
}
}
let IstioInject = [{op: "add", path: "/spec/template/metadata/labels/sidecar.istio.io~1inject", value: "true"}]
#Kustomize: _patches: {
mesh: {
target: {
group: "apps"
version: "v1"
kind: "Deployment"
name: "argocd-server"
}
patch: yaml.Marshal(IstioInject)
}
}
// Probably shouldn't use the authproxy struct and should instead define an identity provider struct.
let AuthProxySpec = #AuthProxySpec & #Platform.authproxy
let OIDCConfig = {
name: "Holos Platform"
issuer: AuthProxySpec.issuer
clientID: #Platform.argocd.clientID
requestedIDTokenClaims: groups: essential: true
requestedScopes: ["openid", "profile", "email", "groups", "urn:zitadel:iam:org:domain:primary:\(AuthProxySpec.orgDomain)"]
enablePKCEAuthentication: true
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,43 @@
package holos
// https://cert-manager.io/docs/
#TargetNamespace: "cert-manager"
spec: components: HelmChartList: [
#HelmChart & {
metadata: name: "prod-mesh-certmanager"
_dependsOn: "prod-secrets-namespaces": _
namespace: #TargetNamespace
_values: #Values & {
installCRDs: true
startupapicheck: enabled: false
// Must not use kube-system on gke autopilot. GKE Warden authz blocks access.
global: leaderElection: namespace: #TargetNamespace
}
chart: {
name: "cert-manager"
version: "1.14.3"
repository: {
name: "jetstack"
url: "https://charts.jetstack.io"
}
}
},
]
// https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-resource-requests#min-max-requests
#PodResources: {
requests: {
cpu: string | *"250m"
memory: string | *"512Mi"
"ephemeral-storage": string | *"100Mi"
}
}
// https://cloud.google.com/kubernetes-engine/docs/how-to/autopilot-spot-pods
#NodeSelector: {
// "kubernetes.io/os": "linux"
// "cloud.google.com/gke-spot": "true"
}

View File

@@ -9,7 +9,7 @@ let GitHubConfigSecret = "controller-manager"
// Just sync the external secret, don't configure the scale set
// Work around https://github.com/actions/actions-runner-controller/issues/3351
if #IsPrimaryCluster == false {
spec: components: KubernetesObjects: [
spec: components: KubernetesObjectsList: [
#KubernetesObjects & {
metadata: name: "prod-github-arc-runner"
_dependsOn: "prod-secrets-namespaces": _
@@ -23,7 +23,7 @@ if #IsPrimaryCluster == false {
// Put the scale set on the primary cluster.
if #IsPrimaryCluster == true {
spec: components: HelmCharts: [
spec: components: HelmChartList: [
#HelmChart & {
_dependsOn: "prod-secrets-namespaces": _
metadata: name: "prod-github-arc-runner"

View File

@@ -3,7 +3,7 @@ package holos
#TargetNamespace: #ARCSystemNamespace
#InputKeys: component: "arc-system"
spec: components: HelmCharts: [
spec: components: HelmChartList: [
#HelmChart & {
metadata: name: "prod-github-arc-system"

View File

@@ -1,15 +1,13 @@
package holos
import "list"
spec: components: KubernetesObjects: [
spec: components: KubernetesObjectsList: [
#KubernetesObjects & {
metadata: name: "prod-secrets-namespaces"
apiObjectMap: (#APIObjects & {
apiObjects: {
// #ManagedNamespaces is the set of all namespaces across all clusters in the platform.
for k, ns in #ManagedNamespaces {
if list.Contains(ns.clusterNames, #ClusterName) {
if ns.clusters[#ClusterName] != _|_ {
Namespace: "\(k)": #Namespace & ns.namespace
}
}

View File

@@ -1,17 +1,19 @@
package holos
#InputKeys: component: "istio-base"
#TargetNamespace: "istio-system"
spec: components: HelmChartList: [
#HelmChart & {
_dependsOn: "prod-secrets-namespaces": _
#HelmChart & {
namespace: #TargetNamespace
chart: {
name: "base"
version: "1.20.3"
repository: {
name: "istio"
url: "https://istio-release.storage.googleapis.com/charts"
metadata: name: "prod-mesh-istio-base"
namespace: "istio-system"
chart: {
name: "base"
version: #IstioVersion
repository: {
name: "istio"
url: "https://istio-release.storage.googleapis.com/charts"
}
}
}
values: #IstioValues
}
_values: #IstioValues
},
]

View File

@@ -1,10 +1,13 @@
package holos
#InputKeys: component: "cni"
#TargetNamespace: "kube-system"
spec: components: HelmChartList: [
#HelmChart & {
_dependsOn: "prod-secrets-namespaces": _
_dependsOn: "prod-mesh-istio-base": _
#HelmChart & {
namespace: #TargetNamespace
chart: name: "cni"
values: #IstioValues
}
_values: #IstioValues
metadata: name: "\(#InstancePrefix)-\(chart.name)"
namespace: "kube-system"
chart: name: "cni"
},
]

View File

@@ -4,43 +4,56 @@ import "list"
// The primary istio Gateway, named default
let Name = "gateway"
#InputKeys: component: Name
#TargetNamespace: "istio-ingress"
#DependsOn: _IngressGateway
let LoginCert = #PlatformCerts.login
spec: components: KubernetesObjectsList: [
#KubernetesObjects & {
_dependsOn: "prod-secrets-stores": _
_dependsOn: "prod-mesh-istio-base": _
_dependsOn: "prod-mesh-ingress": _
#KubernetesObjects & {
apiObjects: {
ExternalSecret: login: #ExternalSecret & {
_name: "login"
metadata: name: "\(#InstancePrefix)-\(Name)"
apiObjectMap: OBJECTS.apiObjectMap
},
]
// GatewayServers represents all hosts for all VirtualServices in the cluster attached to Gateway/default
// NOTE: This is a critical structure because the default Gateway should be used in most cases.
let GatewayServers = {
for Project in _Projects {
for server in (#ProjectTemplate & {project: Project}).ClusterGatewayServers {
(server.port.name): server
}
}
for k, svc in #OptionalServices {
if svc.enabled && list.Contains(svc.clusterNames, #ClusterName) {
for server in svc.servers {
(server.port.name): server
}
}
}
if #PlatformServers[#ClusterName] != _|_ {
for server in #PlatformServers[#ClusterName] {
(server.port.name): server
}
}
}
let OBJECTS = #APIObjects & {
apiObjects: {
Gateway: default: #Gateway & {
metadata: name: "default"
metadata: namespace: #TargetNamespace
spec: selector: istio: "ingressgateway"
spec: servers: [
{
hosts: [for dnsName in LoginCert.spec.dnsNames {"prod-iam-zitadel/\(dnsName)"}]
port: name: "https-prod-iam-login"
port: number: 443
port: protocol: "HTTPS"
tls: credentialName: LoginCert.spec.secretName
tls: mode: "SIMPLE"
},
]
spec: servers: [for x in GatewayServers {x}]
}
for k, svc in #OptionalServices {
if svc.enabled && list.Contains(svc.clusterNames, #ClusterName) {
Gateway: "\(svc.name)": #Gateway & {
metadata: name: svc.name
metadata: namespace: #TargetNamespace
spec: selector: istio: "ingressgateway"
spec: servers: [for s in svc.servers {s}]
}
for k, s in svc.servers {
ExternalSecret: "\(s.tls.credentialName)": _
}

View File

@@ -1,8 +1,12 @@
package holos
let Name = "httpbin"
let SecretName = #InputKeys.cluster + "-" + Name
let MatchLabels = {app: Name} & #SelectorLabels
let ComponentName = "\(#InstancePrefix)-\(Name)"
let MatchLabels = {
app: Name
"app.kubernetes.io/instance": ComponentName
}
let Metadata = {
name: Name
namespace: #TargetNamespace
@@ -12,11 +16,22 @@ let Metadata = {
#InputKeys: component: Name
#TargetNamespace: "istio-ingress"
#DependsOn: _IngressGateway
let Cert = #PlatformCerts[SecretName]
let Cert = #PlatformCerts["\(#ClusterName)-httpbin"]
#KubernetesObjects & {
spec: components: KubernetesObjectsList: [
#KubernetesObjects & {
_dependsOn: "prod-secrets-namespaces": _
_dependsOn: "\(#InstancePrefix)-istio-base": _
_dependsOn: "\(#InstancePrefix)-ingress": _
metadata: name: ComponentName
apiObjectMap: OBJECTS.apiObjectMap
},
]
let OBJECTS = #APIObjects & {
apiObjects: {
ExternalSecret: "\(Cert.spec.secretName)": _
Deployment: httpbin: #Deployment & {
@@ -24,7 +39,6 @@ let Cert = #PlatformCerts[SecretName]
spec: selector: matchLabels: MatchLabels
spec: template: {
metadata: labels: MatchLabels
metadata: labels: #CommonLabels
metadata: labels: #IstioSidecar
spec: securityContext: seccompProfile: type: "RuntimeDefault"
spec: containers: [{
@@ -35,8 +49,8 @@ let Cert = #PlatformCerts[SecretName]
seccompProfile: type: "RuntimeDefault"
allowPrivilegeEscalation: false
runAsNonRoot: true
runAsUser: 1337
runAsGroup: 1337
runAsUser: 8192
runAsGroup: 8192
capabilities: drop: ["ALL"]
}}]
}
@@ -48,24 +62,10 @@ let Cert = #PlatformCerts[SecretName]
{port: 80, targetPort: 8080, protocol: "TCP", name: "http"},
]
}
Gateway: httpbin: #Gateway & {
metadata: Metadata
spec: selector: istio: "ingressgateway"
spec: servers: [
{
hosts: [for host in Cert.spec.dnsNames {"\(#TargetNamespace)/\(host)"}]
port: name: "https-\(#InstanceName)"
port: number: 443
port: protocol: "HTTPS"
tls: credentialName: Cert.spec.secretName
tls: mode: "SIMPLE"
},
]
}
VirtualService: httpbin: #VirtualService & {
metadata: Metadata
spec: hosts: [for host in Cert.spec.dnsNames {host}]
spec: gateways: ["\(#TargetNamespace)/\(Name)"]
spec: gateways: ["istio-ingress/default"]
spec: http: [{route: [{destination: host: Name}]}]
}
}

View File

@@ -2,50 +2,64 @@ package holos
import "encoding/json"
#InputKeys: component: "ingress"
#TargetNamespace: "istio-ingress"
#DependsOn: _IstioD
let ComponentName = "\(#InstancePrefix)-ingress"
#HelmChart & {
chart: name: "gateway"
namespace: #TargetNamespace
values: #GatewayValues & {
// This component expects the load balancer to send the PROXY protocol header.
// Refer to: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.2/guide/service/annotations/#proxy-protocol-v2
podAnnotations: "proxy.istio.io/config": json.Marshal(_ProxyProtocol)
// TODO This configuration is specific to the OIS Metal NLB, refactor it out to the metal collection.
service: {
type: "NodePort"
annotations: "service.beta.kubernetes.io/aws-load-balancer-proxy-protocol": "*"
externalTrafficPolicy: "Local"
// Add 30000 to the port to get the Nodeport
ports: [
{
name: "status-port"
port: 15021
protocol: "TCP"
targetPort: 15021
nodePort: 30021
},
{
name: "http2"
port: 80
protocol: "TCP"
targetPort: 80
nodePort: 30080
},
{
name: "https"
port: 443
protocol: "TCP"
targetPort: 443
nodePort: 30443
},
]
#TargetNamespace: "istio-ingress"
spec: components: HelmChartList: [
#HelmChart & {
_dependsOn: "prod-secrets-namespaces": _
_dependsOn: "\(#InstancePrefix)-istio-base": _
_dependsOn: "\(#InstancePrefix)-istiod": _
metadata: name: ComponentName
chart: name: "gateway"
namespace: #TargetNamespace
_values: #GatewayValues & {
// This component expects the load balancer to send the PROXY protocol header.
// Refer to: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.2/guide/service/annotations/#proxy-protocol-v2
podAnnotations: "proxy.istio.io/config": json.Marshal(_ProxyProtocol)
// TODO This configuration is specific to the OIS Metal NLB, refactor it out to the metal collection.
service: {
type: "NodePort"
annotations: "service.beta.kubernetes.io/aws-load-balancer-proxy-protocol": "*"
externalTrafficPolicy: "Local"
// Add 30000 to the port to get the Nodeport
ports: [
{
name: "status-port"
port: 15021
protocol: "TCP"
targetPort: 15021
nodePort: 30021
},
{
name: "http2"
port: 80
protocol: "TCP"
targetPort: 80
nodePort: 30080
},
{
name: "https"
port: 443
protocol: "TCP"
targetPort: 443
nodePort: 30443
},
]
}
}
}
apiObjects: _APIObjects
}
apiObjectMap: OBJECTS.apiObjectMap
// Auth Proxy
apiObjectMap: _IngressAuthProxy.Deployment.apiObjectMap
// Auth Policy
apiObjectMap: _IngressAuthProxy.Policy.apiObjectMap
// Auth Policy Exclusions
apiObjectMap: _AuthPolicyRules.objects.apiObjectMap
},
]
_ProxyProtocol: gatewayTopology: proxyProtocol: {}
@@ -60,36 +74,82 @@ let RedirectMetaName = {
namespace: #TargetNamespace
}
// https-redirect
_APIObjects: {
Gateway: {
"\(RedirectMetaName.name)": #Gateway & {
metadata: RedirectMetaName
spec: selector: GatewayLabels
spec: servers: [{
port: {
number: 80
name: "http2"
protocol: "HTTP2"
}
hosts: ["*"]
// handled by the VirtualService
tls: httpsRedirect: false
}]
let OBJECTS = #APIObjects & {
apiObjects: {
Gateway: {
"\(RedirectMetaName.name)": #Gateway & {
metadata: RedirectMetaName
spec: selector: GatewayLabels
spec: servers: [{
port: {
number: 80
name: "http2"
protocol: "HTTP2"
}
hosts: ["*"]
// handled by the VirtualService
tls: httpsRedirect: false
}]
}
}
}
VirtualService: {
"\(RedirectMetaName.name)": #VirtualService & {
metadata: RedirectMetaName
spec: hosts: ["*"]
spec: gateways: [RedirectMetaName.name]
spec: http: [{
match: [{withoutHeaders: ":path": prefix: "/.well-known/acme-challenge/"}]
redirect: {
scheme: "https"
redirectCode: 302
VirtualService: {
"\(RedirectMetaName.name)": #VirtualService & {
metadata: RedirectMetaName
spec: hosts: ["*"]
spec: gateways: [RedirectMetaName.name]
spec: http: [{
match: [{withoutHeaders: ":path": prefix: "/.well-known/acme-challenge/"}]
redirect: {
scheme: "https"
redirectCode: 302
}
}]
}
}
Deployment: {
loopback: #Deployment & {
_description: LoopbackDescription
metadata: LoopbackMetaName
spec: {
selector: matchLabels: LoopbackLabels
template: {
metadata: {
annotations: "inject.istio.io/templates": "gateway"
annotations: #Description & {
_Description: LoopbackDescription
}
labels: LoopbackLabels & {"sidecar.istio.io/inject": "true"}
}
spec: {
serviceAccountName: "istio-ingressgateway"
// Allow binding to all ports (such as 80 and 443)
securityContext: {
runAsNonRoot: true
seccompProfile: type: "RuntimeDefault"
sysctls: [{name: "net.ipv4.ip_unprivileged_port_start", value: "0"}]
}
containers: [{
name: "istio-proxy"
image: "auto" // Managed by istiod
securityContext: {
allowPrivilegeEscalation: false
capabilities: drop: ["ALL"]
runAsUser: 1337
runAsGroup: 1337
}
}]
}
}
}
}]
}
}
Service: {
loopback: #Service & {
_description: LoopbackDescription
metadata: LoopbackMetaName
spec: selector: LoopbackLabels
spec: ports: [{port: 80, name: "http"}, {port: 443, name: "https"}]
}
}
}
}
@@ -104,52 +164,3 @@ let LoopbackMetaName = {
name: LoopbackName
namespace: #TargetNamespace
}
// istio-ingressgateway-loopback
_APIObjects: {
Deployment: {
loopback: #Deployment & {
_description: LoopbackDescription
metadata: LoopbackMetaName
spec: {
selector: matchLabels: LoopbackLabels
template: {
metadata: {
annotations: "inject.istio.io/templates": "gateway"
annotations: #Description & {
_Description: LoopbackDescription
}
labels: LoopbackLabels & {"sidecar.istio.io/inject": "true"}
}
spec: {
serviceAccountName: "istio-ingressgateway"
// Allow binding to all ports (such as 80 and 443)
securityContext: {
runAsNonRoot: true
seccompProfile: type: "RuntimeDefault"
sysctls: [{name: "net.ipv4.ip_unprivileged_port_start", value: "0"}]
}
containers: [{
name: "istio-proxy"
image: "auto" // Managed by istiod
securityContext: {
allowPrivilegeEscalation: false
capabilities: drop: ["ALL"]
runAsUser: 1337
runAsGroup: 1337
}
}]
}
}
}
}
}
Service: {
loopback: #Service & {
_description: LoopbackDescription
metadata: LoopbackMetaName
spec: selector: LoopbackLabels
spec: ports: [{port: 80, name: "http"}, {port: 443, name: "https"}]
}
}
}

View File

@@ -1,10 +1,8 @@
package holos
#DependsOn: _IstioBase
#HelmChart: {
chart: {
version: "1.20.3"
version: #IstioVersion
repository: {
name: "istio"
url: "https://istio-release.storage.googleapis.com/charts"

View File

@@ -5,22 +5,28 @@ import "encoding/yaml"
#InputKeys: component: "istiod"
#TargetNamespace: "istio-system"
#HelmChart & {
namespace: #TargetNamespace
chart: {
name: "istiod"
}
values: #IstioValues & {
pilot: {
// The istio meshconfig ConfigMap is handled in the holos component instead of
// the upstream chart so extension providers can be collected from holos data.
configMap: false
// Set to `type: RuntimeDefault` to use the default profile if available.
seccompProfile: type: "RuntimeDefault"
spec: components: HelmChartList: [
#HelmChart & {
_dependsOn: "prod-secrets-namespaces": _
_dependsOn: "\(#InstancePrefix)-istio-base": _
metadata: name: "prod-mesh-istiod"
chart: name: "istiod"
namespace: #TargetNamespace
_values: #IstioValues & {
pilot: {
// The istio meshconfig ConfigMap is handled in the holos component instead of
// the upstream chart so extension providers can be collected from holos data.
configMap: false
// Set to `type: RuntimeDefault` to use the default profile if available.
seccompProfile: type: "RuntimeDefault"
}
}
}
apiObjects: ConfigMap: istio: #IstioConfigMap
}
apiObjectMap: OBJECTS.apiObjectMap
},
]
let OBJECTS = #APIObjects & {apiObjects: ConfigMap: istio: #IstioConfigMap}
#IstioConfigMap: #ConfigMap & {
metadata: {

View File

@@ -1,74 +1,9 @@
package holos
// Ingress Gateway default auth proxy
let Provider = _IngressAuthProxy.AuthProxySpec.provider
let Service = _IngressAuthProxy.service
#MeshConfig: extensionProviderMap: (Provider): envoyExtAuthzHttp: service: Service
// Istio meshconfig
// TODO: Generate per-project extauthz providers.
_MeshConfig: {
accessLogEncoding: "JSON"
accessLogFile: "/dev/stdout"
defaultConfig: {
discoveryAddress: "istiod.istio-system.svc:15012"
tracing: zipkin: address: "zipkin.istio-system:9411"
}
defaultProviders: metrics: ["prometheus"]
enablePrometheusMerge: true
// For PROXY PROTOCOL at the ingress gateway.
gatewayTopology: {
numTrustedProxies: 2
}
rootNamespace: "istio-system"
trustDomain: "cluster.local"
extensionProviders: [{
name: "cluster-trace"
zipkin: {
maxTagLength: 56
port: 9411
service: "zipkin.istio-system.svc"
}
}, {
name: "cluster-gatekeeper"
envoyExtAuthzHttp: {
headersToDownstreamOnDeny: [
"content-type",
"set-cookie",
]
headersToUpstreamOnAllow: [
"authorization",
"path",
"x-auth-request-user",
"x-auth-request-email",
"x-auth-request-access-token",
]
includeAdditionalHeadersInCheck: "X-Auth-Request-Redirect": "%REQ(x-forwarded-proto)%://%REQ(:authority)%%REQ(:path)%%REQ(:query)%"
includeRequestHeadersInCheck: [
"authorization",
"cookie",
"x-forwarded-for",
]
port: 4180
service: "oauth2-proxy.istio-ingress.svc.cluster.local"
}
}, {
name: "core-authorizer"
envoyExtAuthzHttp: {
headersToDownstreamOnDeny: [
"content-type",
"set-cookie",
]
headersToUpstreamOnAllow: [
"authorization",
"path",
"x-auth-request-user",
"x-auth-request-email",
"x-auth-request-access-token",
]
includeAdditionalHeadersInCheck: "X-Auth-Request-Redirect": "%REQ(x-forwarded-proto)%://%REQ(:authority)%%REQ(:path)%%REQ(:query)%"
includeRequestHeadersInCheck: [
"authorization",
"cookie",
"x-forwarded-for",
]
port: 4180
service: "oauth2-proxy.prod-core-system.svc.cluster.local"
}
}]
}
_MeshConfig: (#MeshConfig & {projects: _Projects}).config

View File

@@ -126,7 +126,7 @@ package holos
hub: "docker.io/istio"
// Default tag for Istio images.
tag: "1.20.3"
tag: #IstioVersion
// Variant of the image to use.
// Currently supported are: [debug, distroless]

Some files were not shown because too many files have changed in this diff Show More