Carsten Schafer
551d56c62c
Add another GH worker
...
Signed-off-by: Carsten Schafer <Carsten.Schafer@kinarasystems.com >
2024-08-06 08:57:41 -04:00
Carsten Schafer
120f175afc
Prometheus into 2nd cluster
...
Signed-off-by: Carsten Schafer <Carsten.Schafer@kinarasystems.com >
2024-05-30 15:53:55 -04:00
Carsten Schafer
60ecc0687c
WIFI-13130: Add k8s dashboard as well and fix certs
...
Signed-off-by: Carsten Schafer <Carsten.Schafer@kinarasystems.com >
2024-04-10 12:16:50 -04:00
Carsten Schafer
e616156663
Install new cluster for OWLS testing
...
Signed-off-by: Carsten Schafer <Carsten.Schafer@kinarasystems.com >
2024-04-10 09:39:49 -04:00
Carsten Schafer
f891f5c864
Enable and upgrade prometheus and grafana
...
Signed-off-by: Carsten Schafer <Carsten.Schafer@kinarasystems.com >
2024-01-18 16:49:35 -05:00
Carsten Schafer
2d52cdbdb9
Use basic auth for k8s dashboard and add optional Portainer deployment
2024-01-15 15:41:38 -05:00
Carsten Schafer
7a5f89bfc6
Reflect current state of what's installed including actions runner for github most recently
...
Signed-off-by: Carsten Schafer <Carsten.Schafer@kinarasystems.com >
2023-12-07 08:46:25 -05:00
Dmitry Dunaev
87bd371314
[WIFI-12022] Fix: deprecated bitnami app versions
...
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com >
2022-12-21 13:32:12 +01:00
Johann Hoffmann
2117e44ae3
[WIFI-11785] Fix Elasticsearch instability ( #232 )
...
* Remove readiness check from data nodes
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
* Adapt Toolsmith repo name for Atlantis
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
* Increase ES resources
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
* Decrease ES data pod memory limit again
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
2022-12-06 16:37:38 +01:00
Dmitry Dunaev
e3fd692910
[WIFI-11553] Chg: cleanup
2022-11-22 14:24:43 +03:00
Johann Hoffmann
6d9df4453f
Adapt alert to use new exitcode metric
...
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
2022-11-09 15:33:31 +01:00
Dmitry Dunaev
ebf7022c81
[WIFI-11509] Chg: switch wlan-testing QA dashboard to one used in analytics
...
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com >
2022-11-09 14:44:57 +03:00
Dmitry Dunaev
1bd4032e8c
[WIFI-10880] Add: Helmfile - QA dashboard and fix dependencies
...
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com >
2022-11-03 16:37:57 +03:00
Johann Hoffmann
82945b6846
Increase readiness timeout value for data nodes ( #224 )
...
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
2022-11-03 13:13:34 +01:00
Johann Hoffmann
23d6fbd3c2
[WIFI-11295] elasticsearch-client crashing in restart loop ( #222 )
...
* Increase heap size for elasticsearch-client
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
* Adapt resource limits after heapsize increase
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
* Switch to local Elasticsearch chart and increase timeoutSeconds
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
* Increase CPU limit for elasticsearch-data pods
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
* Commit Terraform lock file for last PR
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
2022-10-27 16:50:43 +02:00
Dmitry Dunaev
bb5b14207c
[WIFI-9213] Chg: decrease amount of allowed self-hosted runners
...
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com >
2022-10-23 19:17:55 +03:00
Johann Hoffmann
190ab3f4d3
Upgrade kube-prometheus-stack to 41.5.1 ( #219 )
...
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
2022-10-23 14:30:33 +02:00
Johann Hoffmann
8e314cbea7
Add missing Helmfile values for core-dump-handler and separate pod termination alerts ( #216 )
...
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
2022-07-28 13:33:09 +02:00
Johann Hoffmann
eee3b1690b
[WIFI-10088] Research and install solution to keep coredumps for debugging purposes ( #215 )
...
* Add IAM user and bucket for core-dump-handler
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
* Fix Terraform format
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
* Add core-dumps-s3 to Atlantis
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
* Remove outputs.tf and switch to separate S3 ACL resource
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
* Fix Terraform state key name
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
* Add core-dump-handler to helmfile
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
* Commit helmfile.lock
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
* Enable helmfile deployment of core-dump-handler
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
2022-07-21 17:33:42 +02:00
Dmitry Dunaev
43a2581f2f
[TOOLS-150] Add: ingress nginx setup
2022-07-18 15:03:52 +03:00
Dmitry Dunaev
6023c8a5ba
[TOOLS-150] Chg: upgrade nginx ingress
...
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com >
2022-07-18 14:17:45 +03:00
Johann Hoffmann
f218f3e4ab
[WIFI-10094] Prometheus alerts are not getting sent to Slack ( #213 )
...
* Renew webhook URL and remove obsolete resources
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
* Remove obsolete secret file reference
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
* Re-add Elasticsearch datasource for future use
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
2022-07-14 17:45:20 +02:00
Dmitry Dunaev
24c393ae86
[WIFI-9824] Add: elasticsearch exporter + example alert
...
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com >
2022-07-14 12:22:07 +03:00
Johann Hoffmann
839355238a
Re-add lost config and add encrypted Slack API URL ( #211 )
...
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
2022-07-11 19:47:18 +02:00
Johann Hoffmann
a8beeed98b
[TOOLS-151] Add Prometheus alert for container segfault ( #210 )
...
* Change pod OOM alert to also fire when termination reason is error
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
* Update alert description
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
2022-07-08 11:26:58 +02:00
Dmitry Dunaev
f71d00d6d5
[WIFI-9930] Fix: github runner autoscaler trigger target
...
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com >
2022-06-27 12:48:14 +03:00
Dmitry Dunaev
4dbb204a3e
[WIFI-9828] Add: resources limits for services
...
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com >
2022-06-24 16:10:15 +03:00
Dmitry Dunaev
95c6b59b05
[TOOLS-150] Chg: move kibana chart to local
...
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com >
2022-06-23 14:06:53 +03:00
Dmitry Dunaev
9ece1dcb8b
[WIFI-9658] Add: cert-manager resources limtis to make it predictable
...
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com >
2022-06-20 12:08:29 +03:00
Johann Hoffmann
0015e8b1b7
Switch to correct apiVersion for ingress resource
...
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
2022-06-15 18:42:58 +02:00
Johann Hoffmann
52a7bf2365
Switch deprecated apiVersion to networking.k8s.io/v1
...
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
2022-06-15 18:04:15 +02:00
Dmitry Dunaev
e90f48ebee
[WIFI-8050] Chg: upgrade EKS to 1.21 and alb ingress controller chart to 1.4.2
...
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com >
2022-05-27 13:23:27 +03:00
Dmitry Dunaev
c13e53fdee
[WIFI-3570] Add: Performance Grafana dashboard in Helmfile
...
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com >
2022-05-17 17:03:07 +03:00
Johann Hoffmann
3b5a2556bd
Add grok pattern to parse Docker logs and remove unnecessary field
...
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
2022-05-17 11:14:16 +02:00
Johann Hoffmann
1a90446115
Push helmfile.lock after adding projectcalico repo
...
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
2022-05-12 12:01:06 +02:00
Johann Hoffmann
cf1b80082e
Fix missing repo, increase logstash resources and remove Elasticsearch internal LoadBalancer resource
...
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
2022-05-12 11:28:50 +02:00
Dmitry Dunaev
9985c85ac4
[WIFI-7854] Fix: allow access from github actions to kube api
...
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com >
2022-05-09 15:28:07 +03:00
Johann Hoffmann
9d46389431
Expose Elasticsearch through internal AWS LB
...
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
2022-05-03 15:45:44 +02:00
Dmitry Dunaev
bfe89fbd96
[WIFI-3399] Add: calico in helmfile
...
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com >
2022-04-27 10:50:16 +03:00
Johann Hoffmann
3526e534ab
Remove old Helm chart templates and Grafana dashboards
...
Signed-off-by: Johann Hoffmann <johann.hoffmann@mailbox.org >
2022-04-11 12:49:34 +02:00
Dmitry Dunaev
dcb2c7e476
[WIFI-7207] Chg: github actions ingress pathType to ImplementationSpecific
...
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com >
2022-03-02 13:17:06 +03:00
Dmitry Dunaev
3aa5e233bc
[WIFI-7207] Chg: update aws-load-balancer-controller to 1.4.0
...
Signed-off-by: Dmitry Dunaev <dmitry@opsfleet.com >
2022-03-01 15:51:23 +03:00
Max Brenner
557b5bb489
fix actions-runner-controller deployment
2022-02-21 13:15:18 +01:00
Max
1e670cc4e1
update actions-runner-controller to version 0.16.1 ( #190 )
...
* update actions-runner-controller to version 0.16.1
2022-02-21 12:00:53 +01:00
Max
2a7e16e481
add alert for PVC consumption ( #188 )
2022-01-28 14:54:53 +01:00
Max
cd8756230b
update various Helm charts ( #183 )
2022-01-28 14:48:48 +01:00
Max Brenner
fb6ca122da
increase Github runner memory and CPU
2022-01-12 13:48:36 +01:00
Max
2852df9305
update kube-prometheus-stack ( #181 )
2022-01-11 17:06:45 +01:00
Max
8d75f7e00a
upgrade prometheus-operator ( #173 )
2022-01-03 10:54:35 +01:00
Max
7cff0f07fd
upgrade cluster autoscaler ( #172 )
2021-12-29 16:39:18 +01:00