Add monitoring for NATs

Co-authored-by: Andrei Kvapil <kvapss@gmail.com>
Signed-off-by: kklinch0 <kklinch0@gmail.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
This commit is contained in:
kklinch0
2025-09-03 15:27:38 +03:00
committed by Andrei Kvapil
parent affd91dd41
commit 7ac989923d
12 changed files with 3086 additions and 30 deletions

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -83,6 +83,8 @@ modules/340-monitoring-kubernetes/monitoring/grafana-dashboards//flux/flux-stats
modules/340-monitoring-kubernetes/monitoring/grafana-dashboards//kafka/strimzi-kafka.json
modules/340-monitoring-kubernetes/monitoring/grafana-dashboards//seaweedfs/seaweedfs.json
modules/340-monitoring-kubernetes/monitoring/grafana-dashboards//goldpinger/goldpinger.json
modules/340-monitoring-kubernetes/monitoring/grafana-dashboards//nats/nats-jetstream.json
modules/340-monitoring-kubernetes/monitoring/grafana-dashboards//nats/nats-server.json
EOT

View File

@@ -95,10 +95,6 @@ spec:
{{- with .Values.storageClass }}
storageClassName: {{ . }}
{{- end }}
promExporter:
enabled: true
podMonitor:
enabled: true
{{- if .Values.external }}
service:
merge:

View File

@@ -43,3 +43,5 @@ hubble/overview
hubble/dns-namespace
hubble/l7-http-metrics
hubble/network-overview
nats/nats-jetstream
nats/nats-server

View File

@@ -1,5 +1,5 @@
apiVersion: v2
appVersion: 2.10.17
appVersion: 2.11.8
description: A Helm chart for the NATS.io High Speed Cloud Native Distributed Communications
Technology.
home: http://github.com/nats-io/k8s
@@ -13,4 +13,4 @@ maintainers:
name: The NATS Authors
url: https://github.com/nats-io
name: nats
version: 1.2.1
version: 1.3.13

View File

@@ -44,7 +44,7 @@ Everything in the NATS Config or Kubernetes Resources can be overridden by `merg
| `container` | nats [k8s Container](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.24/#container-v1-core) | yes |
| `reloader` | config reloader [k8s Container](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.24/#container-v1-core) | yes |
| `promExporter` | prometheus exporter [k8s Container](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.24/#container-v1-core) | no |
| `promExporter.podMonitor` | [prometheus PodMonitor](https://prometheus-operator.dev/docs/operator/api/#monitoring.coreos.com/v1.PodMonitor) | no |
| `promExporter.podMonitor` | [prometheus PodMonitor](https://prometheus-operator.dev/docs/api-reference/api/#monitoring.coreos.com/v1.PodMonitor) | no |
| `service` | [k8s Service](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.24/#service-v1-core) | yes |
| `statefulSet` | [k8s StatefulSet](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.24/#statefulset-v1-apps) | yes |
| `podTemplate` | [k8s PodTemplate](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.24/#pod-v1-core) | yes |
@@ -60,7 +60,7 @@ Everything in the NATS Config or Kubernetes Resources can be overridden by `merg
### Merge
Merging is performed using the Helm `merge` function. Example - add NATS accounts and container resources:
Merging is performed using the Helm [`merge` function](https://helm.sh/docs/chart_template_guide/function_list/#merge-mustmerge). Example - add NATS accounts and container resources:
```yaml
config:
@@ -119,14 +119,22 @@ podTemplate:
### NATS Container Resources
We recommend setting both **requests and limits** - for both **CPU and memory** - **to the same value** for the following reasons:
* It ensures your NATS pod has [predictable performance](https://www.datadoghq.com/blog/kubernetes-cpu-requests-limits/#predictability:~:text=If%20containers%20are,available%20capacity%20decreases.).
* The NATS server [automatically sets](https://github.com/nats-io/nats-server/blob/v2.11.0/main.go#L131-L132) [GOMAXPROCS](https://github.com/golang/go/blob/go1.24.1/src/runtime/extern.go#L230-L234) to the number of CPU cores defined in the `limits` section. If `limits` are not set, GOMAXPROCS defaults to the node's physical core count, which can lead to [poor performance](https://github.com/golang/go/issues/33803).
* The pod will be assigned to the ["Guaranteed" QoS class](https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/#guaranteed), making it less likely to be evicted when node resources are constrained.
Deviate from this recommendation only if you fully understand the implications of your settings.
```yaml
container:
env:
# different from k8s units, suffix must be B, KiB, MiB, GiB, or TiB
# should be ~90% of memory limit
GOMEMLIMIT: 7GiB
# Different from k8s units, suffix must be B, KiB, MiB, GiB, or TiB
# Should be ~80% of memory limit
GOMEMLIMIT: 6GiB
merge:
# recommended limit is at least 2 CPU cores and 8Gi Memory for production JetStream clusters
# Recommended minimum: at least 2 CPU cores and 8Gi memory for production JetStream clusters
resources:
requests:
cpu: "2"
@@ -138,11 +146,27 @@ container:
### Specify Image Version
```yaml
container:
image:
tag: x.y.z-alpine
```
The container image can now be overridden by specifying either the image tag, an image digest, or a full image name. Examples below illustrate the options:
- To set the tag:
```yaml
container:
image:
tag: x.y.z-alpine
```
- To use an image digest, which overrides the tag:
```yaml
container:
image:
repository: nats
digest: sha256:abcdef1234567890...
```
- To override the registry, repository, tag, and digest all at once, specify a full image name:
```yaml
container:
image:
fullImageName: custom-reg.io/myimage@sha256:abcdef1234567890...
```
### Operator Mode with NATS Resolver

View File

@@ -69,3 +69,7 @@ spec:
- {{ merge (dict "topologyKey" $k "labelSelector" (dict "matchLabels" (include "nats.selectorLabels" $ | fromYaml))) $v | toYaml | nindent 4 }}
{{- end }}
{{- end}}
# terminationGracePeriodSeconds determines how long to wait for graceful shutdown
# this should be at least `lameDuckGracePeriod` + 20s shutdown overhead
terminationGracePeriodSeconds: 60

View File

@@ -27,4 +27,5 @@ args:
{{- if .Values.config.gateway.enabled }}
- -gatewayz
{{- end }}
- http://localhost:{{ .Values.config.monitor.port }}/
{{- $monitorProto := ternary "https" "http" .Values.config.monitor.tls.enabled }}
- {{ $monitorProto }}://{{ .Values.promExporter.monitorDomain }}:{{ .Values.config.monitor.port }}/

View File

@@ -147,10 +147,18 @@ app.kubernetes.io/component: nats-box
Print the image
*/}}
{{- define "nats.image" }}
{{- $image := printf "%s:%s" .repository .tag }}
{{- $image := "" }}
{{- if .digest }}
{{- $image = printf "%s@%s" .repository .digest }}
{{- else }}
{{- $image = printf "%s:%s" .repository .tag }}
{{- end }}
{{- if or .registry .global.image.registry }}
{{- $image = printf "%s/%s" (.registry | default .global.image.registry) $image }}
{{- end -}}
{{- end }}
{{- if .fullImageName }}
{{- $image = .fullImageName }}
{{- end }}
image: {{ $image }}
{{- if or .pullPolicy .global.image.pullPolicy }}
imagePullPolicy: {{ .pullPolicy | default .global.image.pullPolicy }}
@@ -274,7 +282,7 @@ output: string with following format rules
*/}}
{{- define "nats.formatConfig" -}}
{{-
(regexReplaceAll "\"<<\\s+(.*)\\s+>>\""
(regexReplaceAll "\"<<\\s+(.*?)\\s+>>\""
(regexReplaceAll "\".*\\$include\": \"(.*)\",?" (include "toPrettyRawJson" .) "include ${1};")
"${1}")
-}}

View File

@@ -238,6 +238,7 @@ config:
tls:
# config.nats.tls must be enabled also
# when enabled, monitoring port will use HTTPS with the options from config.nats.tls
# if promExporter is also enabled, consider setting promExporter.monitorDomain
enabled: false
profiling:
@@ -312,9 +313,13 @@ config:
container:
image:
repository: nats
tag: 2.10.17-alpine
tag: 2.11.8-alpine
pullPolicy:
registry:
# if digest is provided, it overrides tag (example: "sha256:abcdef1234567890")
digest:
# if fullImageName is provided, it overrides registry, repository, tag, and digest
fullImageName:
# container port options
# must be enabled in the config section also
@@ -353,9 +358,11 @@ reloader:
enabled: true
image:
repository: natsio/nats-server-config-reloader
tag: 0.15.0
tag: 0.19.1
pullPolicy:
registry:
digest:
fullImageName:
# env var map, see nats.env for an example
env: {}
@@ -363,7 +370,7 @@ reloader:
# all nats container volume mounts with the following prefixes
# will be mounted into the reloader container
natsVolumeMountPrefixes:
- /etc/
- /etc/
# merge or patch the container
# https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.24/#container-v1-core
@@ -378,11 +385,16 @@ promExporter:
enabled: false
image:
repository: natsio/prometheus-nats-exporter
tag: 0.15.0
tag: 0.17.3
pullPolicy:
registry:
digest:
fullImageName:
port: 7777
# if config.monitor.tls.enabled is set to true, monitorDomain must be set to the common name
# or a SAN used in the tls certificate
monitorDomain: localhost
# env var map, see nats.env for an example
env: {}
@@ -398,13 +410,12 @@ promExporter:
enabled: false
# merge or patch the pod monitor
# https://prometheus-operator.dev/docs/operator/api/#monitoring.coreos.com/v1.PodMonitor
# https://prometheus-operator.dev/docs/api-reference/api/#monitoring.coreos.com/v1.PodMonitor
merge: {}
patch: []
# defaults to "{{ include "nats.fullname" $ }}"
name:
############################################################
# service
############################################################
@@ -511,7 +522,6 @@ serviceAccount:
# defaults to "{{ include "nats.fullname" $ }}"
name:
############################################################
# natsBox
#
@@ -564,9 +574,11 @@ natsBox:
container:
image:
repository: natsio/nats-box
tag: 0.14.3
tag: 0.18.0
pullPolicy:
registry:
digest:
fullImageName:
# env var map, see nats.env for an example
env: {}
@@ -624,7 +636,6 @@ natsBox:
# defaults to "{{ include "nats.fullname" $ }}-box"
name:
################################################################################
# Extra user-defined resources
################################################################################

View File

@@ -9,3 +9,7 @@ nats:
cluster:
routeURLs:
k8sClusterDomain: cozy.local
promExporter:
enabled: true
podMonitor:
enabled: true