132 Commits

Author SHA1 Message Date
Carsten Schafer
551d56c62c Add another GH worker
Signed-off-by: Carsten Schafer <Carsten.Schafer@kinarasystems.com>
2024-08-06 08:57:41 -04:00
Carsten Schafer
120f175afc Prometheus into 2nd cluster
Signed-off-by: Carsten Schafer <Carsten.Schafer@kinarasystems.com>
2024-05-30 15:53:55 -04:00
Carsten Schafer
60ecc0687c WIFI-13130: Add k8s dashboard as well and fix certs
Signed-off-by: Carsten Schafer <Carsten.Schafer@kinarasystems.com>
2024-04-10 12:16:50 -04:00
Carsten Schafer
e616156663 Install new cluster for OWLS testing
Signed-off-by: Carsten Schafer <Carsten.Schafer@kinarasystems.com>
2024-04-10 09:39:49 -04:00
Carsten Schafer
f891f5c864 Enable and upgrade prometheus and grafana
Signed-off-by: Carsten Schafer <Carsten.Schafer@kinarasystems.com>
2024-01-18 16:49:35 -05:00
Carsten Schafer
2d52cdbdb9 Use basic auth for k8s dashboard and add optional Portainer deployment 2024-01-15 15:41:38 -05:00
Carsten Schafer
7a5f89bfc6 Reflect current state of what's installed including actions runner for github most recently
Signed-off-by: Carsten Schafer <Carsten.Schafer@kinarasystems.com>
2023-12-07 08:46:25 -05:00
Dmitry Dunaev
87bd371314 [WIFI-12022] Fix: deprecated bitnami app versions
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com>
2022-12-21 13:32:12 +01:00
Johann Hoffmann
2117e44ae3 [WIFI-11785] Fix Elasticsearch instability (#232)
* Remove readiness check from data nodes

Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>

* Adapt Toolsmith repo name for Atlantis

Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>

* Increase ES resources

Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>

* Decrease ES data pod memory limit again

Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>

Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>
2022-12-06 16:37:38 +01:00
Dmitry Dunaev
e3fd692910 [WIFI-11553] Chg: cleanup 2022-11-22 14:24:43 +03:00
Johann Hoffmann
6d9df4453f Adapt alert to use new exitcode metric
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>
2022-11-09 15:33:31 +01:00
Dmitry Dunaev
ebf7022c81 [WIFI-11509] Chg: switch wlan-testing QA dashboard to one used in analytics
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com>
2022-11-09 14:44:57 +03:00
Dmitry Dunaev
1bd4032e8c [WIFI-10880] Add: Helmfile - QA dashboard and fix dependencies
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com>
2022-11-03 16:37:57 +03:00
Johann Hoffmann
82945b6846 Increase readiness timeout value for data nodes (#224)
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>

Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>
2022-11-03 13:13:34 +01:00
Johann Hoffmann
23d6fbd3c2 [WIFI-11295] elasticsearch-client crashing in restart loop (#222)
* Increase heap size for elasticsearch-client

Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>

* Adapt resource limits after heapsize increase

Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>

* Switch to local Elasticsearch chart and increase timeoutSeconds

Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>

* Increase CPU limit for elasticsearch-data pods

Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>

* Commit Terraform lock file for last PR

Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>

Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>
2022-10-27 16:50:43 +02:00
Dmitry Dunaev
bb5b14207c [WIFI-9213] Chg: decrease amount of allowed self-hosted runners
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com>
2022-10-23 19:17:55 +03:00
Johann Hoffmann
190ab3f4d3 Upgrade kube-prometheus-stack to 41.5.1 (#219)
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>

Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>
2022-10-23 14:30:33 +02:00
Johann Hoffmann
8e314cbea7 Add missing Helmfile values for core-dump-handler and separate pod termination alerts (#216)
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>
2022-07-28 13:33:09 +02:00
Johann Hoffmann
eee3b1690b [WIFI-10088] Research and install solution to keep coredumps for debugging purposes (#215)
* Add IAM user and bucket for core-dump-handler

Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>

* Fix Terraform format

Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>

* Add core-dumps-s3 to Atlantis

Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>

* Remove outputs.tf and switch to separate S3 ACL resource

Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>

* Fix Terraform state key name

Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>

* Add core-dump-handler to helmfile

Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>

* Commit helmfile.lock

Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>

* Enable helmfile deployment of core-dump-handler

Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>
2022-07-21 17:33:42 +02:00
Dmitry Dunaev
43a2581f2f [TOOLS-150] Add: ingress nginx setup 2022-07-18 15:03:52 +03:00
Dmitry Dunaev
6023c8a5ba [TOOLS-150] Chg: upgrade nginx ingress
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com>
2022-07-18 14:17:45 +03:00
Johann Hoffmann
f218f3e4ab [WIFI-10094] Prometheus alerts are not getting sent to Slack (#213)
* Renew webhook URL and remove obsolete resources

Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>

* Remove obsolete secret file reference

Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>

* Re-add Elasticsearch datasource for future use

Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>
2022-07-14 17:45:20 +02:00
Dmitry Dunaev
24c393ae86 [WIFI-9824] Add: elasticsearch exporter + example alert
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com>
2022-07-14 12:22:07 +03:00
Johann Hoffmann
839355238a Re-add lost config and add encrypted Slack API URL (#211)
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>
2022-07-11 19:47:18 +02:00
Johann Hoffmann
a8beeed98b [TOOLS-151] Add Prometheus alert for container segfault (#210)
* Change pod OOM alert to also fire when termination reason is error

Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>

* Update alert description

Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>
2022-07-08 11:26:58 +02:00
Dmitry Dunaev
f71d00d6d5 [WIFI-9930] Fix: github runner autoscaler trigger target
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com>
2022-06-27 12:48:14 +03:00
Dmitry Dunaev
4dbb204a3e [WIFI-9828] Add: resources limits for services
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com>
2022-06-24 16:10:15 +03:00
Dmitry Dunaev
95c6b59b05 [TOOLS-150] Chg: move kibana chart to local
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com>
2022-06-23 14:06:53 +03:00
Dmitry Dunaev
9ece1dcb8b [WIFI-9658] Add: cert-manager resources limtis to make it predictable
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com>
2022-06-20 12:08:29 +03:00
Johann Hoffmann
0015e8b1b7 Switch to correct apiVersion for ingress resource
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>
2022-06-15 18:42:58 +02:00
Johann Hoffmann
52a7bf2365 Switch deprecated apiVersion to networking.k8s.io/v1
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>
2022-06-15 18:04:15 +02:00
Dmitry Dunaev
e90f48ebee [WIFI-8050] Chg: upgrade EKS to 1.21 and alb ingress controller chart to 1.4.2
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com>
2022-05-27 13:23:27 +03:00
Dmitry Dunaev
c13e53fdee [WIFI-3570] Add: Performance Grafana dashboard in Helmfile
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com>
2022-05-17 17:03:07 +03:00
Johann Hoffmann
3b5a2556bd Add grok pattern to parse Docker logs and remove unnecessary field
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>
2022-05-17 11:14:16 +02:00
Johann Hoffmann
1a90446115 Push helmfile.lock after adding projectcalico repo
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>
2022-05-12 12:01:06 +02:00
Johann Hoffmann
cf1b80082e Fix missing repo, increase logstash resources and remove Elasticsearch internal LoadBalancer resource
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>
2022-05-12 11:28:50 +02:00
Dmitry Dunaev
9985c85ac4 [WIFI-7854] Fix: allow access from github actions to kube api
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com>
2022-05-09 15:28:07 +03:00
Johann Hoffmann
9d46389431 Expose Elasticsearch through internal AWS LB
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>
2022-05-03 15:45:44 +02:00
Dmitry Dunaev
bfe89fbd96 [WIFI-3399] Add: calico in helmfile
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com>
2022-04-27 10:50:16 +03:00
Johann Hoffmann
3526e534ab Remove old Helm chart templates and Grafana dashboards
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org>
2022-04-11 12:49:34 +02:00
Dmitry Dunaev
dcb2c7e476 [WIFI-7207] Chg: github actions ingress pathType to ImplementationSpecific
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com>
2022-03-02 13:17:06 +03:00
Dmitry Dunaev
3aa5e233bc [WIFI-7207] Chg: update aws-load-balancer-controller to 1.4.0
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com>
2022-03-01 15:51:23 +03:00
Max Brenner
557b5bb489 fix actions-runner-controller deployment 2022-02-21 13:15:18 +01:00
Max
1e670cc4e1 update actions-runner-controller to version 0.16.1 (#190)
* update actions-runner-controller to version 0.16.1
2022-02-21 12:00:53 +01:00
Max
2a7e16e481 add alert for PVC consumption (#188) 2022-01-28 14:54:53 +01:00
Max
cd8756230b update various Helm charts (#183) 2022-01-28 14:48:48 +01:00
Max Brenner
fb6ca122da increase Github runner memory and CPU 2022-01-12 13:48:36 +01:00
Max
2852df9305 update kube-prometheus-stack (#181) 2022-01-11 17:06:45 +01:00
Max
8d75f7e00a upgrade prometheus-operator (#173) 2022-01-03 10:54:35 +01:00
Max
7cff0f07fd upgrade cluster autoscaler (#172) 2021-12-29 16:39:18 +01:00