refactor: update readme, remove unused code

This commit is contained in:
Toboshii Nakama
2022-07-28 00:14:14 -05:00
parent 8e739ca16f
commit 89a5ea32af
67 changed files with 122 additions and 1707 deletions

218
README.md
View File

@@ -1,128 +1,128 @@
<img src="https://camo.githubusercontent.com/5b298bf6b0596795602bd771c5bddbb963e83e0f/68747470733a2f2f692e696d6775722e636f6d2f7031527a586a512e706e67" align="left" width="144px" height="144px"/>
# My home Kubernetes cluster :sailboat:
_... managed by Flux and serviced with RenovateBot_ :robot:
# My home operations repository 🎛🔨
_... managed by Flux Renovate, and GitHub Actions_ 🤖
<br />
<br />
<br />
[![Discord](https://img.shields.io/discord/673534664354430999?color=7289da&label=DISCORD&style=for-the-badge)](https://discord.gg/sTMX7Vh)
[![k3s](https://img.shields.io/badge/k3s-v1.20.6-orange?style=for-the-badge)](https://k3s.io/)
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white&style=for-the-badge)](https://github.com/pre-commit/pre-commit)
[![renovate](https://img.shields.io/badge/renovate-enabled-green?style=for-the-badge&logo=)](https://github.com/renovatebot/renovate)
---
## :book:&nbsp; Overview
This repository _is_ my home Kubernetes cluster in a declarative state. [Flux](https://github.com/fluxcd/flux2) watches my [cluster](./cluster/) folder and makes the changes to my cluster based on the YAML manifests.
Feel free to open a [Github issue](https://github.com/toboshii/home-cluster/issues/new/choose) or join the [k8s@home Discord](https://discord.gg/sTMX7Vh) if you have any questions.
This repository is built off the [k8s-at-home/template-cluster-k3s](https://github.com/k8s-at-home/template-cluster-k3s) repository.
---
## :sparkles:&nbsp; Cluster setup
This cluster consists of VMs provisioned on [PVE](https://www.proxmox.com/en/proxmox-ve) via the [Terraform Proxmox provider](https://github.com/Telmate/terraform-provider-proxmox). These run [k3s](https://k3s.io/) provisioned overtop Ubuntu 20.10 using the [Ansible](https://www.ansible.com/) galaxy role [ansible-role-k3s](https://github.com/PyratLabs/ansible-role-k3s). This cluster is not hyper-converged as block storage is provided by the underlying PVE Ceph cluster using rook-ceph-external.
See my [server/ansible](./server/ansible/) directory for my playbooks and roles, and [server/terraform](./server/terraform) for infrastructure provisioning.
## :art:&nbsp; Cluster components
- [kube-vip](https://kube-vip.io/): Uses BGP to load balance the control-plane API, making it highly availible without requiring external HA proxy solutions.
- [calico](https://docs.projectcalico.org/about/about-calico): For internal cluster networking using BGP.
- [traefik](https://traefik.io/): Provides ingress cluster services.
- [rook-ceph](https://rook.io/): Provides persistent volumes, allowing any application to consume RBD block storage from the underlying PVE cluster.
- [SOPS](https://toolkit.fluxcd.io/guides/mozilla-sops/): Encrypts secrets which is safe to store - even to a public repository.
- [external-dns](https://github.com/kubernetes-sigs/external-dns): Creates DNS entries in a separate [coredns](https://github.com/coredns/coredns) deployment which is backed by my clusters [etcd](https://github.com/etcd-io/etcd) deployment.
- [cert-manager](https://cert-manager.io/docs/): Configured to create TLS certs for all ingress services automatically using LetsEncrypt.
- [kasten-k10](https://www.kasten.io): Provides disaster recovery via snapshots and out-of-band backups.
---
## :open_file_folder:&nbsp; Repository structure
The Git repository contains the following directories under `cluster` and are ordered below by how Flux will apply them.
- **base** directory is the entrypoint to Flux
- **crds** directory contains custom resource definitions (CRDs) that need to exist globally in your cluster before anything else exists
- **core** directory (depends on **crds**) are important infrastructure applications (grouped by namespace) that should never be pruned by Flux
- **apps** directory (depends on **core**) is where your common applications (grouped by namespace) could be placed, Flux will prune resources here if they are not tracked by Git anymore
```
./cluster
├── ./apps
├── ./base
├── ./core
└── ./crds
```
---
## :robot:&nbsp; Automate all the things!
- [Github Actions](https://docs.github.com/en/actions) for checking code formatting
- Rancher [System Upgrade Controller](https://github.com/rancher/system-upgrade-controller) to apply updates to k3s
- [Renovate](https://github.com/renovatebot/renovate) with the help of the [k8s-at-home/renovate-helm-releases](https://github.com/k8s-at-home/renovate-helm-releases) Github action keeps my application charts and container images up-to-date
---
## :spider_web:&nbsp; Networking
In my network Calico is configured with BGP on my Brocade ICX 6610. With BGP enabled, I advertise a load balancer using `externalIPs` on my Kubernetes services. This makes it so I do not need `Metallb`. Another benefit to this is that I can directly hit any pods IP directly from any device on my local network. All physical hardware (including local clients) are interconnected with 10gig networking, with a seperate dedicated 10gig network for Ceph traffic.
| Name | CIDR |
| --------------------------- | --------------- |
| Management | `10.75.10.0/24` |
| Physical Servers | `10.75.30.0/24` |
| CoroSync0 | `10.75.31.0/24` |
| CoroSync1 | `10.75.32.0/24` |
| Ceph Cluster | `10.75.33.0/24` |
| Virtual Servers | `10.75.40.0/24` |
| K8s external services (BGP) | `10.75.45.0/24` |
| K8s pods | `172.22.0.0/16` |
| K8s services | `172.24.0.0/16` |
## :man_shrugging:&nbsp; DNS
_(this section blindly copied from [Devin Buhl](https://github.com/onedr0p/home-cluster) as I could never attempt to explain this in a better way)_
To prefix this, I should mention that I only use one domain name for internal and externally facing applications. Also this is the most complicated thing to explain but I will try to sum it up.
On [pfSense](https://arstechnica.com/gadgets/2021/03/buffer-overruns-license-violations-and-bad-code-freebsd-13s-close-call/) under `Services: DNS Resolver: Domain Overrides` I have a `Domain Override` set to my domain with the address pointing to my _in-cluster-non-cluster service_ CoreDNS load balancer IP. This allows me to use [Split-horizon DNS](https://en.wikipedia.org/wiki/Split-horizon_DNS). [external-dns](https://github.com/kubernetes-sigs/external-dns) reads my clusters `Ingress`'s and inserts DNS records containing the sub-domain and load balancer IP (of traefik) into the _in-cluster-non-cluster service_ CoreDNS service and into Cloudflare depending on if an annotation is present on the ingress. See the diagram below for a visual representation.
<div align="center">
<img src="https://user-images.githubusercontent.com/213795/116820353-91f6e480-ab42-11eb-9109-95e485df9249.png" align="center" />
[![Discord](https://img.shields.io/discord/673534664354430999?style=for-the-badge&label=discord&logo=discord&logoColor=white)](https://discord.gg/k8s-at-home)
[![talos](https://img.shields.io/badge/talos-v1.1.2-brightgreen?style=for-the-badge&logo=linux&logoColor=white)](https://www.talos.dev/)
[![kubernetes](https://img.shields.io/badge/kubernetes-v1.24.3-brightgreen?style=for-the-badge&logo=kubernetes&logoColor=white)](https://kubernetes.io/)
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white&style=for-the-badge)](https://github.com/pre-commit/pre-commit)
[![GitHub Workflow Status](https://img.shields.io/github/workflow/status/toboshii/home-ops/Schedule%20-%20Renovate?label=renovate&logo=renovatebot&style=for-the-badge)](https://github.com/toboshii/home-ops/actions/workflows/schedule-renovate.yaml)
[![Lines of code](https://img.shields.io/tokei/lines/github/toboshii/home-ops?style=for-the-badge&color=brightgreen&label=lines&logo=codefactor&logoColor=white)](https://github.com/toboshii/home-ops/graphs/contributors)
</div>
---
## :gear:&nbsp; Hardware
## 📖 Overview
| Device | Count | OS Disk Size | Data Disk Size | Ram | Purpose |
| ---------------- | ----- | ------------ | ---------------------------------- | ----- | ---------------------------------------- |
| Intel R1208GL4DS | 4 | 120GB SSD | 2x480GB SSD<br/>4x900GB 10.6k SAS | 64GB | Proxmox hypervisors<br/>and Ceph cluster |
| Intel R1208GL4DS | 1 | 120GB SSD | 2x900GB 10.6k SAS | 32GB | Backup cold spare |
| NAS (franxx) | 1 | 120GB SSD | 16x8TB RAIDZ2<br/>6x4TB ZFS Mirror | 192GB | Media and shared file storage |
This is a mono repository for my home infrastructure and Kubernetes cluster implementing Infrastructure as Code (IaC) and GitOps practices using tools like [Kubernetes](https://kubernetes.io/), [Flux](https://github.com/fluxcd/flux2), [Renovate](https://github.com/renovatebot/renovate) and [GitHub Actions](https://github.com/features/actions).
Feel free to open a [Github issue](https://github.com/toboshii/home-ops/issues/new/choose) or join the [k8s@home Discord](https://discord.gg/sTMX7Vh) if you have any questions.
---
## :wrench:&nbsp; Tools
## ⛵ Kubernetes
| Tool | Purpose |
| ------------------------------------------------------ | ------------------------------------------------------------------------- |
| [direnv](https://github.com/direnv/direnv) | Sets `KUBECONFIG` environment variable based on present working directory |
| [go-task](https://github.com/go-task/task) | Alternative to makefiles, who honestly likes that? |
| [pre-commit](https://github.com/pre-commit/pre-commit) | Enforce code consistency and verifies no secrets are pushed |
| [stern](https://github.com/stern/stern) | Tail logs in Kubernetes |
This repo generally attempts to follow the structure and practices of the excellent [k8s-at-home/template-cluster-k3](https://github.com/k8s-at-home/template-cluster-k3s), check it out if you're uncomfortable starting out with an immutable operating system.
### Installation
The cluster is running on [Talos Linux](https://talos.dev/), an immutable and ephemeral Linux distribution built around Kubernetes, deployed on bare-metal. [Rook Ceph](https://rook.io/) running hyper-converged with workloads provides persistent block and object storage, while a seperate server provides bulk (NFS) file storage.
### Core components
- [cilium/cilium](https://github.com/cilium/cilium): Internal Kubernetes networking plugin.
- [rook/rook](https://github.com/rook/rook): Distributed block storage for peristent storage.
- [mozilla/sops](https://toolkit.fluxcd.io/guides/mozilla-sops/): Manages secrets for Kubernetes, Ansible and Terraform.
- [kubernetes-sigs/external-dns](https://github.com/kubernetes-sigs/external-dns): Automatically manages DNS records from my cluster in a cloud DNS provider.
- [jetstack/cert-manager](https://cert-manager.io/docs/): Creates SSL certificates for services in my Kubernetes cluster.
- [kubernetes/ingress-nginx](https://github.com/kubernetes/ingress-nginx/): Ingress controller to expose HTTP traffic to pods over DNS.
### GitOps
[Flux](https://github.com/fluxcd/flux2) watches my [cluster](./cluster/) folder (see Directories below) and makes the changes to my cluster based on the YAML manifests.
[Renovate](https://github.com/renovatebot/renovate) watches my **entire** repository looking for dependency updates, when they are found a PR is automatically created. When PRs are merged, [Flux](https://github.com/fluxcd/flux2) applies the changes to my cluster.
### Directories
This Git repository contains the following directories (_kustomizatons_) under [cluster](./cluster/).
```sh
📁 cluster # k8s cluster defined as code
├─📁 bootstrap # contains the initial kustomization used to install flux
├─📁 flux # flux, gitops operator, loaded before everything
├─📁 crds # custom resources, loaded before 📁 core and 📁 apps
├─📁 charts # helm repos, loaded before 📁 core and 📁 apps
├─📁 config # cluster config, loaded before 📁 core and 📁 apps
├─📁 core # crucial apps, namespaced dir tree, loaded before 📁 apps
└─📁 apps # regular apps, namespaced dir tree, loaded last
```
### Networking
| Name | CIDR |
|----------------------------------------------|-----------------|
| Kubernetes Nodes | `10.75.40.0/24` |
| Kubernetes external services (Cilium w/ BGP) | `10.75.45.0/24` |
| Kubernetes pods | `172.22.0.0/16` |
| Kubernetes services | `172.24.0.0/16` |
## 🌐 DNS
### Ingress Controller
Over WAN, I have port forwarded ports `80` and `443` to the load balancer IP of my ingress controller that's running in my Kubernetes cluster.
[Cloudflare](https://www.cloudflare.com/) works as a proxy to hide my homes WAN IP and also as a firewall. When not on my home network, all the traffic coming into my ingress controller on port `80` and `443` comes from Cloudflare. In `VyOS` I block all IPs not originating from [Cloudflares list of IP ranges](https://www.cloudflare.com/ips/).
🔸 _Cloudflare is also configured to GeoIP block all countries except a few I have whitelisted_
### Internal DNS
[k8s_gateway](https://github.com/ori-edge/k8s_gateway) is deployed on my router running [VyOS](https://vyos.io/). With this setup, `k8s_gateway` has direct access to my clusters ingress records and serves DNS for them in my internal network.
Without much engineering of DNS @home, these options have made my `VyOS` router a single point of failure for DNS. I believe this is ok though because my router _should_ have the most uptime of all my systems.
### External DNS
[external-dns](https://github.com/kubernetes-sigs/external-dns) is deployed in my cluster and configured to sync DNS records to [Cloudflare](https://www.cloudflare.com/). The only ingresses `external-dns` looks at to gather DNS records to put in `Cloudflare` are ones where I explicitly set an annotation of `external-dns.home.arpa/enabled: "true"`
---
## :handshake:&nbsp; Thanks
## 🔧 Hardware
A lot of inspiration for my cluster came from the people that have shared their clusters over at [awesome-home-kubernetes](https://github.com/k8s-at-home/awesome-home-kubernetes)
| Device | Count | OS Disk Size | Data Disk Size | Ram | Operating System | Purpose |
|---------------------------|-------|--------------|----------------------------|-------|------------------|--------------------------------|
| Dell R220 | 1 | 120GB SSD | N/A | 16GB | VyOS 1.4 | Router |
| HP S01-pf1000 | 3 | 120GB SSD | N/A | 8GB | Talos Linux | Kubernetes Control Nodes |
| HP S01-pf1000 | 3 | 120GB SSD | 1TB NVMe (rook-ceph) | 32GB | Talos Linux | Kubernetes Workers |
| SuperMicro SC836 | 1 | 120GB SSD | 16x8TB + 16x3TB ZFS RAIDZ2 | 192GB | Ubuntu 20.04 | NFS |
| Brocade ICX 6610 | 1 | N/A | N/A | N/A | N/A | Core Switch |
| Raspberry Pi 4B | 1 | 32GB SD Card | N/A | 4GB | PiKVM | Network KVM |
| TESmart 8 Port KVM Switch | 1 | N/A | N/A | N/A | N/A | Network KVM switch for PiKVM |
| APC SUA3000RMXL3U w/ NIC | 1 | N/A | N/A | N/A | N/A | UPS |
| APC AP7930 | 1 | N/A | N/A | N/A | N/A | PDU |
---
## 🤝 Thanks
Thanks to all folks who donate their time to the [Kubernetes @Home](https://github.com/k8s-at-home/) community. A lot of inspiration for my cluster came from those that have shared their clusters over at [awesome-home-kubernetes](https://github.com/k8s-at-home/awesome-home-kubernetes).
---
## 📜 Changelog
See [commit history](https://github.com/onedr0p/home-ops/commits/main)
---
## 🔏 License
See [LICENSE](./LICENSE)

View File

@@ -1,53 +0,0 @@
[defaults]
#--- General settings
nocows = True
forks = 8
module_name = command
deprecation_warnings = True
executable = /bin/bash
#--- Files/Directory settings
log_path = ~/ansible.log
inventory = ./inventory
library = /usr/share/my_modules
remote_tmp = ~/.ansible/tmp
local_tmp = ~/.ansible/tmp
roles_path = ./roles
retry_files_enabled = False
#--- Fact Caching settings
fact_caching = jsonfile
fact_caching_connection = ~/.ansible/facts_cache
fact_caching_timeout = 7200
#--- SSH settings
remote_port = 22
timeout = 60
host_key_checking = False
ssh_executable = /usr/bin/ssh
private_key_file = ~/.ssh/id_ed25519
force_valid_group_names = ignore
#--- Speed
callback_whitelist = ansible.posix.profile_tasks
internal_poll_interval = 0.001
[inventory]
unparsed_is_failed = true
[privilege_escalation]
become = True
become_method = sudo
become_user = root
become_ask_pass = False
[ssh_connection]
scp_if_ssh = smart
transfer_method = smart
retries = 3
timeout = 10
ssh_args = -o ControlMaster=auto -o ControlPersist=30m -o Compression=yes -o ServerAliveInterval=15s
pipelining = True
control_path = %(directory)s/%%h-%%r

View File

@@ -1,15 +0,0 @@
---
# Encapsulation type
calico_encapsulation: "None"
# BGP Peer IP
# (usually your router IP address)
calico_bgp_peer_ip: 10.75.40.1
# BGP Autonomous System Number
# (must be the same across all BGP peers)
calico_bgp_as_number: 64512
# BGP Network you want services to consume
# (this network should not exist or be defined anywhere in your network)
calico_bgp_external_ips: 10.75.45.0/24
# CIDR of the host node interface Calico should use
calico_node_cidr: 10.75.40.0/24

View File

@@ -1,35 +0,0 @@
---
#
# Below vars are for the xanmanning.k3s role
# ...see https://github.com/PyratLabs/ansible-role-k3s#globalcluster-variables
#
# Use a specific version of k3s
k3s_release_version: "v1.21.2+k3s1"
# Install using hard links rather than symbolic links.
# ...if you are using the system-upgrade-controller you will need to use hard links rather than symbolic links as the controller will not be able to follow symbolic links.
k3s_install_hard_links: true
# Escalate user privileges for all tasks.
k3s_become_for_all: true
# Enable debugging
k3s_debug: false
# HA settings
k3s_etcd_datastore: true
k3s_registration_address: 10.75.45.5
k3s_registration_domain: k8s-api.dfw.56k.sh
k3s_server_manifests_templates:
- "calico/calico-installation.yaml.j2"
- "calico/calico-bgpconfiguration.yaml.j2"
- "calico/calico-bgppeer.yaml.j2"
- "kube-vip/kube-vip-rbac.yaml.j2"
- "kube-vip/kube-vip-daemonset.yaml.j2"
# Custom manifest URLs
k3s_server_manifests_urls:
- url: https://docs.projectcalico.org/archive/v3.19/manifests/tigera-operator.yaml
filename: tigera-operator.yaml

View File

@@ -1,11 +0,0 @@
---
kubevip_interface: eth0
kubevip_bgp_peer_ip: 10.75.40.1
kubevip_address: 10.75.45.5
kubevip_bgp_as_number: 64512
kubevip_bgp_peer_as_number: 64512

View File

@@ -1,8 +0,0 @@
---
# Enable rsyslog
# ...requires a rsyslog server already set up
rsyslog:
enabled: false
ip: 10.75.45.102
port: 1514

View File

@@ -1,20 +0,0 @@
---
# Enable to skip apt upgrade
skip_upgrade_packages: false
# Enable to skip removing crufty packages
skip_remove_packages: false
# Timezone for the servers
timezone: "America/Chicago"
# Set custom ntp servers
ntp_servers:
primary:
- "gw.dfw.56k.sh"
fallback:
- "0.us.pool.ntp.org"
- "1.us.pool.ntp.org"
- "2.us.pool.ntp.org"
- "3.us.pool.ntp.org"
# Additional ssh public keys to add to the nodes
# ssh_authorized_keys:

View File

@@ -1,9 +0,0 @@
---
nvidia_driver:
version: "465.27"
checksum: "sha256:7e69ffa85bdee6aaaa6b6ea7e1db283b0199f9ab21e41a27dc9048f249dc3171"
nvidia_patch:
version: "d5d564b888aaef99fdd45e23f2fc3eae8e337a39"
checksum: "sha256:d80928c381d141734c13463d69bfaecff77ac66ee6f9036b2f0348b8602989d8"

View File

@@ -1,40 +0,0 @@
---
# https://rancher.com/docs/k3s/latest/en/installation/install-options/server-config/
# https://github.com/PyratLabs/ansible-role-k3s#server-control-plane-configuration
# Define the host as control plane nodes
k3s_control_node: true
# k3s settings for all control-plane nodes
k3s_server:
node-ip: "{{ ansible_host }}"
tls-san:
- "{{ k3s_registration_domain }}"
- "{{ k3s_registration_address }}"
docker: false
flannel-backend: "none" # This needs to be in quotes
disable:
- flannel
- traefik
- servicelb
- metrics-server
- local-storage
disable-network-policy: true
disable-cloud-controller: true
write-kubeconfig-mode: "644"
# Network CIDR to use for pod IPs
cluster-cidr: "172.22.0.0/16"
# Network CIDR to use for service IPs
service-cidr: "172.24.0.0/16"
kubelet-arg:
- "feature-gates=GracefulNodeShutdown=true"
# Required to use kube-prometheus-stack
kube-controller-manager-arg:
- "address=0.0.0.0"
- "bind-address=0.0.0.0"
kube-proxy-arg:
- "metrics-bind-address=0.0.0.0"
kube-scheduler-arg:
- "address=0.0.0.0"
- "bind-address=0.0.0.0"
etcd-expose-metrics: true

View File

@@ -1,12 +0,0 @@
---
# https://rancher.com/docs/k3s/latest/en/installation/install-options/agent-config/
# https://github.com/PyratLabs/ansible-role-k3s#agent-worker-configuration
# Don't define the host as control plane nodes
k3s_control_node: false
# k3s settings for all worker nodes
k3s_agent:
node-ip: "{{ ansible_host }}"
kubelet-arg:
- "feature-gates=GracefulNodeShutdown=true"

View File

@@ -1,15 +0,0 @@
---
# IP address of node
ansible_host: "10.75.40.24"
# Ansible user to ssh into servers with
ansible_user: "ubuntu"
# ansible_ssh_pass: "ubuntu"
# ansible_ssh_common_args: "-o UserKnownHostsFile=/dev/null"
ansible_become_pass: "ubuntu"
# Set enabled to true to mark this host as running a distributed storage rook-ceph
rook_ceph:
enabled: false
devices:
- /dev/nvme0n1

View File

@@ -1,16 +0,0 @@
---
# IP address of node
ansible_host: "10.75.40.10"
# Ansible user to ssh into servers with
ansible_user: "ubuntu"
# ansible_ssh_pass: "ubuntu"
# ansible_ssh_common_args: "-o UserKnownHostsFile=/dev/null"
ansible_become_pass: "ubuntu"
# Set enabled to true to mark this host as running a distributed storage rook-ceph
rook_ceph:
enabled: false
# devices:
# - /dev/nvme0n1

View File

@@ -1,16 +0,0 @@
---
# IP address of node
ansible_host: "10.75.40.11"
# Ansible user to ssh into servers with
ansible_user: "ubuntu"
# ansible_ssh_pass: "ubuntu"
# ansible_ssh_common_args: "-o UserKnownHostsFile=/dev/null"
ansible_become_pass: "ubuntu"
# Set enabled to true to mark this host as running a distributed storage rook-ceph
rook_ceph:
enabled: false
# devices:
# - /dev/nvme0n1

View File

@@ -1,16 +0,0 @@
---
# IP address of node
ansible_host: "10.75.40.12"
# Ansible user to ssh into servers with
ansible_user: "ubuntu"
# ansible_ssh_pass: "ubuntu"
# ansible_ssh_common_args: "-o UserKnownHostsFile=/dev/null"
ansible_become_pass: "ubuntu"
# Set enabled to true to mark this host as running a distributed storage rook-ceph
rook_ceph:
enabled: false
# devices:
# - /dev/nvme0n1

View File

@@ -1,15 +0,0 @@
---
# IP address of node
ansible_host: "10.75.40.20"
# Ansible user to ssh into servers with
ansible_user: "ubuntu"
# ansible_ssh_pass: "ubuntu"
# ansible_ssh_common_args: "-o UserKnownHostsFile=/dev/null"
ansible_become_pass: "ubuntu"
# Set enabled to true to mark this host as running a distributed storage rook-ceph
rook_ceph:
enabled: false
devices:
- /dev/nvme0n1

View File

@@ -1,15 +0,0 @@
---
# IP address of node
ansible_host: "10.75.40.21"
# Ansible user to ssh into servers with
ansible_user: "ubuntu"
# ansible_ssh_pass: "ubuntu"
# ansible_ssh_common_args: "-o UserKnownHostsFile=/dev/null"
ansible_become_pass: "ubuntu"
# Set enabled to true to mark this host as running a distributed storage rook-ceph
rook_ceph:
enabled: false
devices:
- /dev/nvme0n1

View File

@@ -1,15 +0,0 @@
---
# IP address of node
ansible_host: "10.75.40.22"
# Ansible user to ssh into servers with
ansible_user: "ubuntu"
# ansible_ssh_pass: "ubuntu"
# ansible_ssh_common_args: "-o UserKnownHostsFile=/dev/null"
ansible_become_pass: "ubuntu"
# Set enabled to true to mark this host as running a distributed storage rook-ceph
rook_ceph:
enabled: false
devices:
- /dev/nvme0n1

View File

@@ -1,15 +0,0 @@
---
# IP address of node
ansible_host: "10.75.40.23"
# Ansible user to ssh into servers with
ansible_user: "ubuntu"
# ansible_ssh_pass: "ubuntu"
# ansible_ssh_common_args: "-o UserKnownHostsFile=/dev/null"
ansible_become_pass: "ubuntu"
# Set enabled to true to mark this host as running a distributed storage rook-ceph
rook_ceph:
enabled: false
devices:
- /dev/nvme0n1

View File

@@ -1,5 +0,0 @@
---
ansible_host: "10.75.30.15"
ansible_user: toboshii
ansible_become: true

View File

@@ -1,27 +0,0 @@
---
all:
children:
# Control Plane group, do not change the 'control-plane' name
# hosts should match the filenames in 'host_vars'
master-nodes:
hosts:
k8s-master01:
k8s-master02:
k8s-master03:
# Node group, do not change the 'node' name
# hosts should match the filenames in 'host_vars'
worker-nodes:
hosts:
k8s-worker01:
k8s-worker02:
k8s-worker03:
k8s-worker04:
gpu-nodes:
hosts:
k8s-cuda01:
# Storage group, these are my NAS devices
# hosts should match the filenames in 'host_vars'
storage:
hosts:
nas-franxx:

View File

@@ -1,26 +0,0 @@
---
- hosts:
- master-nodes
- worker-nodes
- gpu-nodes
become: true
gather_facts: true
any_errors_fatal: true
pre_tasks:
- name: Pausing for 5 seconds...
pause:
seconds: 5
roles:
- k3s
- hosts:
- gpu-nodes
become: true
gather_facts: true
any_errors_fatal: true
pre_tasks:
- name: Pausing for 5 seconds...
pause:
seconds: 5
roles:
- nvidia

View File

@@ -1,33 +0,0 @@
---
- hosts:
- master-nodes
- worker-nodes
- gpu-nodes
become: true
gather_facts: true
any_errors_fatal: true
pre_tasks:
- name: Pausing for 5 seconds...
pause:
seconds: 5
tasks:
- name: kill k3s
ansible.builtin.command: /usr/local/bin/k3s-killall.sh
- name: uninstall k3s
ansible.builtin.command:
cmd: /usr/local/bin/k3s-uninstall.sh
removes: /usr/local/bin/k3s-uninstall.sh
- name: uninstall k3s agent
ansible.builtin.command:
cmd: /usr/local/bin/k3s-agent-uninstall.sh
removes: /usr/local/bin/k3s-agent-uninstall.sh
- name: gather list of CNI files to delete
find:
paths: /etc/cni/net.d
patterns: "*"
register: files_to_delete
- name: delete CNI files
ansible.builtin.file:
path: "{{ item.path }}"
state: absent
loop: "{{ files_to_delete.files }}"

View File

@@ -1,14 +0,0 @@
---
- hosts:
- master-nodes
- worker-nodes
- gpu-nodes
become: true
gather_facts: true
any_errors_fatal: true
pre_tasks:
- name: Pausing for 5 seconds...
pause:
seconds: 5
roles:
- k3s

View File

@@ -1,14 +0,0 @@
---
- hosts:
- master-nodes
- worker-nodes
- gpu-nodes
become: true
gather_facts: true
any_errors_fatal: true
pre_tasks:
- name: Pausing for 5 seconds...
pause:
seconds: 5
roles:
- ubuntu

View File

@@ -1,23 +0,0 @@
---
- hosts:
- master-nodes
- worker-nodes
- gpu-nodes
become: true
gather_facts: true
any_errors_fatal: true
pre_tasks:
- name: Pausing for 5 seconds...
pause:
seconds: 5
tasks:
- name: upgrade
ansible.builtin.apt:
upgrade: full
update_cache: true
cache_valid_time: 3600
autoclean: true
autoremove: true
register: apt_upgrade
retries: 5
until: apt_upgrade is success

View File

@@ -1 +0,0 @@
jmespath==0.10.0

View File

@@ -1,6 +0,0 @@
---
roles:
- src: xanmanning.k3s
version: v2.11.1
collections:
- name: community.general

View File

@@ -1,13 +0,0 @@
---
- name: addons | check if cluster is installed
ansible.builtin.stat:
path: "/etc/rancher/k3s/config.yaml"
register: k3s_check_installed
check_mode: false
- name: addons | set manifest facts
ansible.builtin.set_fact:
k3s_server_manifests_templates: []
k3s_server_manifests_urls: []
when: k3s_check_installed.stat.exists

View File

@@ -1,13 +0,0 @@
---
- name: cleanup | remove deployed manifest templates
ansible.builtin.file:
path: "{{ k3s_server_manifests_dir }}/{{ item | basename | regex_replace('\\.j2$', '') }}"
state: absent
loop: "{{ k3s_server_manifests_templates }}"
- name: cleanup | remove deployed manifest urls
ansible.builtin.file:
path: "{{ k3s_server_manifests_dir }}/{{ item.filename }}"
state: absent
loop: "{{ k3s_server_manifests_urls }}"

View File

@@ -1,17 +0,0 @@
---
- include: addons.yml
tags:
- addons
- name: k3s | cluster configuration
include_role:
name: xanmanning.k3s
public: true
- include: cleanup.yml
tags:
- cleanup
- include: kubeconfig.yml
tags:
- kubeconfig

View File

@@ -1,10 +0,0 @@
---
apiVersion: crd.projectcalico.org/v1
kind: BGPConfiguration
metadata:
name: default
spec:
serviceClusterIPs:
- cidr: "{{ k3s_server['service-cidr'] }}"
serviceExternalIPs:
- cidr: "{{ calico_bgp_external_ips }}"

View File

@@ -1,8 +0,0 @@
---
apiVersion: crd.projectcalico.org/v1
kind: BGPPeer
metadata:
name: global
spec:
peerIP: {{ calico_bgp_peer_ip }}
asNumber: {{ calico_bgp_as_number }}

View File

@@ -1,18 +0,0 @@
#jinja2:lstrip_blocks: True
---
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
name: default
spec:
calicoNetwork:
# Note: The ipPools section cannot be modified post-install.
ipPools:
- blockSize: 26
cidr: "{{ k3s_server["cluster-cidr"] }}"
encapsulation: "{{ calico_encapsulation }}"
natOutgoing: Enabled
nodeSelector: all()
nodeAddressAutodetectionV4:
cidrs:
- "{{ calico_node_cidr }}"

View File

@@ -1,64 +0,0 @@
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-vip
namespace: kube-system
spec:
selector:
matchLabels:
name: kube-vip
template:
metadata:
labels:
name: kube-vip
spec:
containers:
- name: kube-vip
image: ghcr.io/kube-vip/kube-vip:v0.3.5
imagePullPolicy: IfNotPresent
args:
- manager
env:
- name: vip_arp
value: "false"
- name: vip_interface
value: lo
- name: port
value: "6443"
- name: vip_cidr
value: "32"
- name: cp_enable
value: "true"
- name: cp_namespace
value: kube-system
- name: vip_startleader
value: "false"
- name: vip_loglevel
value: "5"
- name: bgp_enable
value: "true"
- name: bgp_routerinterface
value: "{{ kubevip_interface }}"
- name: bgp_as
value: "{{ kubevip_bgp_as_number }}"
- name: bgp_peeraddress
value: "{{ kubevip_bgp_peer_ip }}"
- name: bgp_peeras
value: "{{ kubevip_bgp_peer_as_number }}"
- name: vip_address
value: "{{ kubevip_address }}"
securityContext:
capabilities:
add:
- NET_ADMIN
- NET_RAW
- SYS_TIME
hostNetwork: true
nodeSelector:
node-role.kubernetes.io/master: "true"
serviceAccountName: kube-vip
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule

View File

@@ -1,33 +0,0 @@
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube-vip
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
name: system:kube-vip-role
rules:
- apiGroups: [""]
resources: ["services", "services/status", "nodes"]
verbs: ["list", "get", "watch", "update"]
- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
verbs: ["list", "get", "watch", "update", "create"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: system:kube-vip-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:kube-vip-role
subjects:
- kind: ServiceAccount
name: kube-vip
namespace: kube-system

View File

@@ -1,2 +0,0 @@
blacklist nouveau
options nouveau modeset=0

View File

@@ -1,53 +0,0 @@
[plugins.opt]
path = "{{ .NodeConfig.Containerd.Opt }}"
[plugins.cri]
stream_server_address = "127.0.0.1"
stream_server_port = "10010"
enable_selinux = {{ .NodeConfig.SELinux }}
{{- if .IsRunningInUserNS }}
disable_cgroup = true
disable_apparmor = true
restrict_oom_score_adj = true
{{end}}
{{- if .NodeConfig.AgentConfig.PauseImage }}
sandbox_image = "{{ .NodeConfig.AgentConfig.PauseImage }}"
{{end}}
{{- if .NodeConfig.AgentConfig.Snapshotter }}
[plugins.cri.containerd]
disable_snapshot_annotations = true
snapshotter = "{{ .NodeConfig.AgentConfig.Snapshotter }}"
{{end}}
{{- if not .NodeConfig.NoFlannel }}
[plugins.cri.cni]
bin_dir = "{{ .NodeConfig.AgentConfig.CNIBinDir }}"
conf_dir = "{{ .NodeConfig.AgentConfig.CNIConfDir }}"
{{end}}
[plugins.cri.containerd.runtimes.runc]
runtime_type = "io.containerd.runtime.v1.linux"
[plugins.linux]
runtime = "nvidia-container-runtime"
{{ if .PrivateRegistryConfig }}
{{ if .PrivateRegistryConfig.Mirrors }}
[plugins.cri.registry.mirrors]{{end}}
{{range $k, $v := .PrivateRegistryConfig.Mirrors }}
[plugins.cri.registry.mirrors."{{$k}}"]
endpoint = [{{range $i, $j := $v.Endpoints}}{{if $i}}, {{end}}{{printf "%q" .}}{{end}}]
{{end}}
{{range $k, $v := .PrivateRegistryConfig.Configs }}
{{ if $v.Auth }}
[plugins.cri.registry.configs."{{$k}}".auth]
{{ if $v.Auth.Username }}username = {{ printf "%q" $v.Auth.Username }}{{end}}
{{ if $v.Auth.Password }}password = {{ printf "%q" $v.Auth.Password }}{{end}}
{{ if $v.Auth.Auth }}auth = {{ printf "%q" $v.Auth.Auth }}{{end}}
{{ if $v.Auth.IdentityToken }}identitytoken = {{ printf "%q" $v.Auth.IdentityToken }}{{end}}
{{end}}
{{ if $v.TLS }}
[plugins.cri.registry.configs."{{$k}}".tls]
{{ if $v.TLS.CAFile }}ca_file = "{{ $v.TLS.CAFile }}"{{end}}
{{ if $v.TLS.CertFile }}cert_file = "{{ $v.TLS.CertFile }}"{{end}}
{{ if $v.TLS.KeyFile }}key_file = "{{ $v.TLS.KeyFile }}"{{end}}
{{ if $v.TLS.InsecureSkipVerify }}insecure_skip_verify = true{{end}}
{{end}}
{{end}}
{{end}}

View File

@@ -1,21 +0,0 @@
---
- name: container-runtime | add apt key
ansible.builtin.apt_key:
url: https://nvidia.github.io/nvidia-container-runtime/gpgkey
state: present
- name: container-runtime | add apt repos
ansible.builtin.apt_repository:
repo: "{{ item }}"
state: present
mode: 0644
update_cache: true
filename: nvidia-container-runtime
with_items:
- "deb https://nvidia.github.io/libnvidia-container/stable/{{ ansible_distribution | lower }}{{ ansible_distribution_version }}/$(ARCH) /"
- "deb https://nvidia.github.io/nvidia-container-runtime/stable/{{ ansible_distribution | lower }}{{ ansible_distribution_version }}/$(ARCH) /"
- name: container-runtime | install nvidia-container-runtime
ansible.builtin.apt:
name: "nvidia-container-runtime"
state: present

View File

@@ -1,39 +0,0 @@
---
- name: driver | blacklist nouveau driver
ansible.builtin.copy:
src: files/blacklist-nouveau.conf
dest: /etc/modprobe.d/blacklist-nouveau.conf
register: blacklist
- name: driver | update initramfs
ansible.builtin.command: "update-initramfs -u"
when: blacklist.changed
- name: driver | reboot to unload nouveau
ansible.builtin.reboot:
when: blacklist.changed
- name: driver | install dkms build tools
ansible.builtin.apt:
name: "{{ item }}"
state: present
with_items:
- "dkms"
- "build-essential"
- name: driver | download nvidia driver
ansible.builtin.get_url:
url: https://international.download.nvidia.com/XFree86/Linux-x86_64/{{ nvidia_driver.version }}/NVIDIA-Linux-x86_64-{{ nvidia_driver.version }}.run
dest: /tmp/NVIDIA-Linux-x86_64-{{ nvidia_driver.version }}.run
checksum: "{{ nvidia_driver.checksum }}"
mode: "0755"
- name: driver | install nvidia driver
ansible.builtin.command:
cmd: "/tmp/NVIDIA-Linux-x86_64-{{ nvidia_driver.version }}.run -s --no-opengl-files --dkms"
creates: "/proc/driver/nvidia/version"
- name: driver | load nvidia driver
modprobe:
name: nvidia
state: present

View File

@@ -1,13 +0,0 @@
---
- name: k3s-agent | enable nvidia-container-runtime
ansible.builtin.copy:
src: files/config.toml.tmpl
dest: /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl
register: containerd_config
- name: k3s-agent | restart agent
service:
name: k3s
state: restarted
when: containerd_config.changed

View File

@@ -1,17 +0,0 @@
---
- include: driver.yml
tags:
- driver
- include: patch.yml
tags:
- patch
- include: container-runtime.yml
tags:
- container-runtime
- include: k3s-agent.yml
tags:
- k3s-agent

View File

@@ -1,18 +0,0 @@
---
- name: patch | create patch directory
ansible.builtin.file:
path: /opt/nvidia-patch
state: directory
mode: '0755'
- name: patch | download nvidia-patch
ansible.builtin.get_url:
url: https://raw.githubusercontent.com/keylase/nvidia-patch/{{ nvidia_patch.version }}/patch.sh
dest: /opt/nvidia-patch/patch.sh
checksum: "{{ nvidia_patch.checksum }}"
mode: '0755'
- name: patch | patch current nvidia driver
ansible.builtin.command:
cmd: /opt/nvidia-patch/patch.sh

View File

@@ -1,46 +0,0 @@
---
packages:
apt_install:
- apt-transport-https
- arptables
- ca-certificates
- curl
- ebtables
- gdisk
- hdparm
- htop
- iputils-ping
- ipvsadm
- net-tools
- nfs-common
- nano
- ntpdate
- open-iscsi
- psmisc
- socat
- software-properties-common
- unattended-upgrades
- unzip
apt_remove:
- apport
- bcache-tools
- btrfs-progs
- byobu
- cloud-init
- cloud-guest-utils
- cloud-initramfs-copymods
- cloud-initramfs-dyn-netconf
- friendly-recovery
- fwupd
- landscape-common
- lxd-agent-loader
- ntfs-3g
- open-vm-tools
- plymouth
- plymouth-theme-ubuntu-text
- popularity-contest
- snapd
- sosreport
- tmux
- ubuntu-advantage-tools
- ufw

View File

@@ -1,4 +0,0 @@
---
- name: reboot
ansible.builtin.reboot:

View File

@@ -1,46 +0,0 @@
---
- name: boot | grub | check for existence of grub
ansible.builtin.stat:
path: /etc/default/grub
register: grub_result
- name: boot | grub | set apparmor=0
ansible.builtin.replace:
path: /etc/default/grub
regexp: '^(GRUB_CMDLINE_LINUX=(?:(?![" ]{{ option | regex_escape }}=).)*)(?:[" ]{{ option | regex_escape }}=\S+)?(.*")$'
replace: '\1 {{ option }}={{ value }}\2'
vars:
option: apparmor
value: 0
when:
- grub_result.stat.exists
notify: reboot
- name: boot | grub | set mitigations=off
ansible.builtin.replace:
path: /etc/default/grub
regexp: '^(GRUB_CMDLINE_LINUX=(?:(?![" ]{{ option | regex_escape }}=).)*)(?:[" ]{{ option | regex_escape }}=\S+)?(.*")$'
replace: '\1 {{ option }}={{ value }}\2'
vars:
option: mitigations
value: "off"
when:
- grub_result.stat.exists
notify: reboot
- name: boot | grub | set pti=off
ansible.builtin.replace:
path: /etc/default/grub
regexp: '^(GRUB_CMDLINE_LINUX=(?:(?![" ]{{ option | regex_escape }}=).)*)(?:[" ]{{ option | regex_escape }}=\S+)?(.*")$'
replace: '\1 {{ option }}={{ value }}\2'
vars:
option: pti
value: "off"
when:
- grub_result.stat.exists
notify: reboot
- name: boot | grub | run grub-mkconfig
ansible.builtin.command: grub-mkconfig -o /boot/grub/grub.cfg
when:
- grub_result.stat.exists

View File

@@ -1,21 +0,0 @@
---
- name: filesystem | sysctl | update max_user_watches
ansible.posix.sysctl:
name: fs.inotify.max_user_watches
value: "65536"
state: present
sysctl_file: /etc/sysctl.d/98-kubernetes-fs.conf
- name: filesystem | swap | disable at runtime
ansible.builtin.command: swapoff -a
when: ansible_swaptotal_mb > 0
- name: filesystem | swap| disable on boot
ansible.posix.mount:
name: "{{ item }}"
fstype: swap
state: absent
loop:
- swap
- none

View File

@@ -1,6 +0,0 @@
---
- name: host | hostname | update inventory hostname
ansible.builtin.hostname:
name: "{{ inventory_hostname }}"
when:
- ansible_hostname != inventory_hostname

View File

@@ -1,19 +0,0 @@
---
- name: kernel | modules | enable at runtime
community.general.modprobe:
name: "{{ item }}"
state: present
loop:
- br_netfilter
- overlay
- rbd
- name: kernel | modules | enable on boot
ansible.builtin.copy:
mode: 0644
content: "{{ item }}"
dest: "/etc/modules-load.d/{{ item }}.conf"
loop:
- br_netfilter
- overlay
- rbd

View File

@@ -1,44 +0,0 @@
---
- name: locale | set timezone
community.general.timezone:
name: "{{ timezone | default('America/Chicago') }}"
- name: locale | copy timesyncd config
ansible.builtin.copy:
mode: 0644
content: |
[Time]
NTP={{ ntp_servers.primary | default("") | join(" ") }}
FallbackNTP={{ ntp_servers.fallback | join(" ") }}
dest: /etc/systemd/timesyncd.conf
when:
- ntp_servers.primary is defined
- ntp_servers.primary is iterable
- ntp_servers.primary | length > 0
- ntp_servers.fallback is defined
- ntp_servers.fallback is iterable
- ntp_servers.fallback | length > 0
- name: locale | start systemd service
ansible.builtin.systemd:
name: systemd-timesyncd
enabled: true
state: started
- name: locale | restart systemd service
ansible.builtin.systemd:
name: systemd-timesyncd
daemon_reload: true
enabled: true
state: restarted
- name: locale | run timedatectl status
ansible.builtin.command: /usr/bin/timedatectl show
changed_when: false
check_mode: false
register: timedatectl_result
- name: locale | enable ntp
ansible.builtin.command: /usr/bin/timedatectl set-ntp true
when:
- "'NTP=no' in timedatectl_result.stdout"

View File

@@ -1,48 +0,0 @@
---
- include: host.yml
tags:
- host
- include: locale.yml
tags:
- locale
- include: packages.yml
tags:
- packages
- include: power-button.yml
tags:
- power-button
- include: kernel.yml
tags:
- kernel
- include: boot.yml
tags:
- boot
- include: network.yml
tags:
- network
- include: filesystem.yml
tags:
- filesystem
- include: unattended-upgrades.yml
tags:
- unattended-upgrades
- include: user.yml
tags:
- user
- include: rsyslog.yml
when:
- rsyslog.enabled is defined
- rsyslog.enabled
tags:
- rsyslog

View File

@@ -1,41 +0,0 @@
---
- name: network | check for bridge-nf-call-iptables
ansible.builtin.stat:
path: /proc/sys/net/bridge/bridge-nf-call-iptables
register: bridge_nf_call_iptables_result
- name: network | sysctl | set config
ansible.builtin.blockinfile:
path: /etc/sysctl.d/99-kubernetes-cri.conf
mode: 0644
create: true
block: |
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-iptables = 1
when:
- bridge_nf_call_iptables_result.stat.exists
register: sysctl_network
- name: network | sysctl | reload
ansible.builtin.shell: sysctl -p /etc/sysctl.d/99-kubernetes-cri.conf
when:
- sysctl_network.changed
- bridge_nf_call_iptables_result.stat.exists
- name: network | check for vm cloud-init config
ansible.builtin.stat:
path: /etc/netplan/50-cloud-init.yaml
register: cloud_init_result
- name: network | set ceph interface mtu
ansible.builtin.lineinfile:
path: /etc/netplan/50-cloud-init.yaml
regexp: '^\s*mtu'
insertafter: '^\s*set-name: eth1'
line: " mtu: 9000"
register: netplan_apply
when: cloud_init_result.stat.exists
- name: network | apply netplan
ansible.builtin.shell: netplan apply
when: netplan_apply.changed

View File

@@ -1,94 +0,0 @@
---
- name: packages | disable recommends
ansible.builtin.blockinfile:
path: /etc/apt/apt.conf.d/02norecommends
mode: 0644
create: true
block: |
APT::Install-Recommends "false";
APT::Install-Suggests "false";
APT::Get::Install-Recommends "false";
APT::Get::Install-Suggests "false";
- name: packages | upgrade all packages
ansible.builtin.apt:
upgrade: full
update_cache: true
cache_valid_time: 3600
autoclean: true
autoremove: true
register: apt_upgrade
retries: 5
until: apt_upgrade is success
when:
- (skip_upgrade_packages is not defined or (skip_upgrade_packages is defined and not skip_upgrade_packages))
- name: packages | install common
ansible.builtin.apt:
name: "{{ packages.apt_install }}"
install_recommends: false
update_cache: true
cache_valid_time: 3600
autoclean: true
autoremove: true
register: apt_install_common
retries: 5
until: apt_install_common is success
when:
- packages.apt_install is defined
- packages.apt_install is iterable
- packages.apt_install | length > 0
- name: packages | remove crufty packages
block:
- name: packages | remove crufty packages | gather install packages
ansible.builtin.package_facts:
manager: auto
when:
- "'snapd' in packages.apt_remove"
- name: packages | remove crufty packages | check if snap is installed
ansible.builtin.debug:
msg: "snapd is installed"
register: snapd_check
when:
- "'snapd' in packages.apt_remove"
- "'snapd' in ansible_facts.packages"
- name: packages | remove crufty packages | remove snap packages
ansible.builtin.command: snap remove {{ item }}
loop:
- lxd
- core18
- snapd
when:
- "'snapd' in packages.apt_remove"
- "'snapd' in ansible_facts.packages"
- snapd_check.failed is defined
- name: packages | remove crufty packages | remove packages
ansible.builtin.apt:
name: "{{ packages.apt_remove }}"
state: absent
autoremove: true
- name: packages | remove crufty packages | remove crufty files
ansible.builtin.file:
state: absent
path: "{{ item }}"
loop:
- "/home/{{ ansible_user }}/.snap"
- "/snap"
- "/var/snap"
- "/var/lib/snapd"
- "/var/cache/snapd"
- "/usr/lib/snapd"
- "/etc/cloud"
- "/var/lib/cloud"
when:
- "'snapd' in packages.apt_remove"
- "'cloud-init' in packages.apt_remove"
when:
- packages.apt_remove is defined
- packages.apt_remove is iterable
- packages.apt_remove | length > 0
- (skip_remove_packages is not defined or (skip_remove_packages is defined and not skip_remove_packages))

View File

@@ -1,15 +0,0 @@
---
- name: power-button | disable single power button press shutdown
ansible.builtin.lineinfile:
path: /etc/systemd/logind.conf
regexp: "{{ item.setting }}"
line: "{{ item.setting }}={{ item.value }}"
loop:
- { setting: HandlePowerKey, value: ignore }
- name: power-button | restart logind systemd service
ansible.builtin.systemd:
name: systemd-logind.service
daemon_reload: true
enabled: true
state: restarted

View File

@@ -1,20 +0,0 @@
---
- name: rsyslog
block:
- name: rsyslog | copy promtail configuration
ansible.builtin.template:
src: "rsyslog-50-promtail.conf.j2"
dest: "/etc/rsyslog.d/50-promtail.conf"
mode: 0644
- name: rsyslog | start systemd service
ansible.builtin.systemd:
name: rsyslog
enabled: true
state: started
- name: rsyslog | restart systemd service
ansible.builtin.systemd:
name: rsyslog.service
daemon_reload: true
enabled: true
state: restarted

View File

@@ -1,38 +0,0 @@
---
- name: unattended-upgrades | copy 20auto-upgrades config
ansible.builtin.blockinfile:
path: /etc/apt/apt.conf.d/20auto-upgrades
mode: 0644
create: true
block: |
APT::Periodic::Update-Package-Lists "14";
APT::Periodic::Download-Upgradeable-Packages "14";
APT::Periodic::AutocleanInterval "7";
APT::Periodic::Unattended-Upgrade "1";
- name: unattended-upgrades | copy 50unattended-upgrades config
ansible.builtin.blockinfile:
path: /etc/apt/apt.conf.d/50unattended-upgrades
mode: 0644
create: true
block: |
Unattended-Upgrade::Automatic-Reboot "false";
Unattended-Upgrade::Remove-Unused-Dependencies "true";
Unattended-Upgrade::Allowed-Origins {
"${distro_id}:${distro_codename}";
"${distro_id} ${distro_codename}-security";
};
- name: unattended-upgrades | start systemd service
ansible.builtin.systemd:
name: unattended-upgrades
enabled: true
state: started
- name: unattended-upgrades | restart systemd service
ansible.builtin.service:
name: unattended-upgrades.service
daemon_reload: true
enabled: true
state: restarted

View File

@@ -1,28 +0,0 @@
---
- name: user | add to sudoers
ansible.builtin.copy:
content: "{{ ansible_user }} ALL=(ALL:ALL) NOPASSWD:ALL"
dest: "/etc/sudoers.d/{{ ansible_user }}_nopasswd"
mode: "0440"
- name: user | add additional SSH public keys
ansible.posix.authorized_key:
user: "{{ ansible_user }}"
key: "{{ item }}"
loop: "{{ ssh_authorized_keys }}"
when:
- ssh_authorized_keys is defined
- ssh_authorized_keys is iterable
- ssh_authorized_keys | length > 0
- name: user | check if hushlogin exists
ansible.builtin.stat:
path: "/home/{{ ansible_user }}/.hushlogin"
register: hushlogin_result
- name: user | silence the login prompt
ansible.builtin.file:
dest: "/home/{{ ansible_user }}/.hushlogin"
state: touch
owner: "{{ ansible_user }}"
mode: "0775"

View File

@@ -1,4 +0,0 @@
module(load="omprog")
module(load="mmutf8fix")
action(type="mmutf8fix" replacementChar="?")
action(type="omfwd" protocol="tcp" target="{{ rsyslog.ip }}" port="{{ rsyslog.port }}" Template="RSYSLOG_SyslogProtocol23Format" TCP_Framing="octet-counted" KeepAlive="on")

View File

@@ -73,7 +73,7 @@ class Talos {
console.log(`Waiting for Talos apid to be available`)
await sleep(30000)
let healthCheck = await retry(30, expBackoff(), () => $`curl -k https://${nodeConfig.ipAddress}:50000`)
let healthCheck = await retry(30, expBackoff(), () => $`nc -z ${nodeConfig.ipAddress} 50000`)
if (await healthCheck.exitCode === 0) {
console.log(`${chalk.green.bold('Success:')} You can now push a machine config to ${this.nodes}`)
}
@@ -86,7 +86,7 @@ class Talos {
// Set TESMART switch channel
async setChannel(headers, channel) {
const response = await fetch(`${this.proto}://${this.kvm}/api/gpio/pulse?channel=server${channel}_switch`, { method: 'POST', headers })
const response = await fetch(`${this.proto}://${this.kvm}/api/gpio/pulse?channel=server${--channel}_switch`, { method: 'POST', headers })
if (!response.ok) {
const json = await response.json()
throw new Error(`${json.result.error} - ${json.result.error_msg}`)
@@ -140,19 +140,19 @@ class Talos {
// Send CTRL-ALT-DEL to piKVM
async sendReboot(headers) {
await Promise.all([
fetch(`${this.proto}://${this.kvm}/api/hid/events/send_key?key=ControlLeft&state=true`, { method: 'POST', headers }),
fetch(`${this.proto}://${this.kvm}/api/hid/events/send_key?key=AltLeft&state=true`, { method: 'POST', headers }),
fetch(`${this.proto}://${this.kvm}/api/hid/events/send_key?key=Delete&state=true`, { method: 'POST', headers }),
])
// await Promise.all([
await fetch(`${this.proto}://${this.kvm}/api/hid/events/send_key?key=ControlLeft&state=true`, { method: 'POST', headers })
await fetch(`${this.proto}://${this.kvm}/api/hid/events/send_key?key=AltLeft&state=true`, { method: 'POST', headers })
await fetch(`${this.proto}://${this.kvm}/api/hid/events/send_key?key=Delete&state=true`, { method: 'POST', headers })
// ])
await sleep(500)
await sleep(2000)
await Promise.all([
fetch(`${this.proto}://${this.kvm}/api/hid/events/send_key?key=ControlLeft&state=false`, { method: 'POST', headers }),
fetch(`${this.proto}://${this.kvm}/api/hid/events/send_key?key=AltLeft&state=false`, { method: 'POST', headers }),
fetch(`${this.proto}://${this.kvm}/api/hid/events/send_key?key=Delete&state=false`, { method: 'POST', headers }),
])
// await Promise.all([
await fetch(`${this.proto}://${this.kvm}/api/hid/events/send_key?key=ControlLeft&state=false`, { method: 'POST', headers })
await fetch(`${this.proto}://${this.kvm}/api/hid/events/send_key?key=AltLeft&state=false`, { method: 'POST', headers })
await fetch(`${this.proto}://${this.kvm}/api/hid/events/send_key?key=Delete&state=false`, { method: 'POST', headers })
// ])
}
}

View File

@@ -18,6 +18,3 @@ bgp:
hubble:
enabled: false
ipv6:
enabled: false

View File

@@ -1,34 +0,0 @@
k8s:
user_password: ENC[AES256_GCM,data:4EVJWH/7,iv:35y9BQXaKcPQMfr5vtcHiWQqV1MZIKchzHExTg0d3H0=,tag:pKBxiQkeHaLvLS0BdI786w==,type:str]
ssh_key: ENC[AES256_GCM,data:q+atXCWn5fSACn9SHOIHxHqHzM7RnySUcJBJzwF18RbEwzSOHthdbLwhYbgy0jiyWollUx6q0OZYKrkNovR+GecqRQZkjPpqBZly/ktMLOc=,iv:bYJcLXeZ5GtgjvOKAeS/JXi+Elx6DIfA4cy9lcAOIJ0=,tag:oZpTLWaUlImDxSxf734oUA==,type:str]
sops:
kms: []
gcp_kms: []
azure_kv: []
hc_vault: []
age: []
lastmodified: "2021-05-11T19:00:29Z"
mac: ENC[AES256_GCM,data:EVEYk/NvrM0zZcpuOCl5Ums1vYzgRQtPzPwAz1WTi3d5CEZFQJxps9N1FsFeVqE65o7myqzOZ2EI/ImVAJsq+8RbpXCbkIBffYcMPqL8KrJsy0nRAKNZtmeYdB1jRthGeW20vzbIAbU//LxNFB3Pm4wmRg3/VFF9UcgT/Ka6MKU=,iv:neN8mfBCW9LYWR1kHdXWu/pW6896Z7+jyCNxPJJGBWc=,tag:3n5Oo9/ZHXziYf3x1P+uzA==,type:str]
pgp:
- created_at: "2021-05-11T18:53:16Z"
enc: |
-----BEGIN PGP MESSAGE-----
hQIMAySEZvKqXwiCAQ//TuGQ7qa8uA3hhn4DpzBPwiR6js536rNwNESIPFlRJSYc
NC7sxhEHdQNeQycbf39uVhQKqF3m9HcT5KePbWkMW+WwjrHOO2rXwDO8AUY8zzdM
COU7s/UeZbTr7kWNGEun6mfPKbQ0nEdCnbXfktbBZJp2vkjFKpnm4Ibnam2D7c8K
j50T+0WJXROpNeF/eVqXCGKHZu7BUqGTuyldHRTMFlxufJzOw1ZeIoN2hXEGR4kT
4OBYCzkAmf0/EMKSQnOwJfSFKfTMaNsYP+fB3kh1fwQlobsdtVDinNLnnFFx1RIS
DSgnsqtxqslNleTNmngqCPL+AISlfuA5oUa6W31IZDK/1amd7h2A/XuLl885NrSp
/CEJosxmmU7lMSwUXLZVi3RyLxJvQWRwvKmzlCwnC3L10WfWuTjUV7Wdb1+SoqBB
gFp/hRYl8I7MIzRJREsjCJmdKYbF2KfJYOtPo7Aht9mmiczgZCJBzwXhE/ZLIHCj
/lyIzogT8h/R9bgRlfZS6qd+okSXKL28K7gBcFZ/7rK0gf+mhtgbxS3emGwoXOa1
JWe+B7m4BK39lZ3vWs0bty3u0tjbb6f40pk2/0EB6tSq3pw0Zqqv4SG5vxaJaOGh
7ICp7m9IMMmhfJ79nvOSwpGORIz/MS3jvxEcWxtnrB7VHMsku0HgdGpQamjZ95jS
XgFX/HLN49ASpyiaF47IPjqtLGX2tJ2lOJg9tYGDayOJGRbL8xJCCDwPjKn5KwOe
gT75OaBWsWvAKCuixi0AF6jS/kC/BXD6k8yDm5uee47kA5bnUdY5Tz9yK77jELw=
=BzQi
-----END PGP MESSAGE-----
fp: 0E883B2F1196288130061C6BA8B44BCF50372B6B
unencrypted_suffix: _unencrypted
version: 3.7.1

View File

@@ -1,40 +0,0 @@
# Peparing Ubuntu cloudinit image for Proxmox
### Download Ubuntu 20.04 cloudimg
`wget https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64.img`
### Install libguestfs-tools on Proxmox server.
`apt-get install libguestfs-tools`
### Install qemu-guest-agent on Ubuntu image.
`virt-customize -a focal-server-cloudimg-amd64.img --install qemu-guest-agent`
### Enable password authentication in the template. Obviously, not recommended for except for testing.
`virt-customize -a focal-server-cloudimg-amd64.img --run-command "sed -i 's/.*PasswordAuthentication.*/PasswordAuthentication yes/g' /etc/ssh/sshd_config"`
### Set environment variables. Change these as necessary.
```sh
export STORAGE_POOL="local-lvm"
export VM_ID="10000"
export VM_NAME="ubuntu-20.04-cloudimg"
```
### Create Proxmox VM image from Ubuntu Cloud Image.
```sh
qm create $VM_ID --memory 2048 --net0 virtio,bridge=vmbr0
qm importdisk $VM_ID focal-server-cloudimg-amd64.img $STORAGE_POOL
qm set $VM_ID --scsihw virtio-scsi-pci --scsi0 $STORAGE_POOL:vm-$VM_ID-disk-0
qm set $VM_ID --agent enabled=1,fstrim_cloned_disks=1
qm set $VM_ID --name $VM_NAME
```
### Create Cloud-Init Disk and configure boot.
```sh
qm set $VM_ID --ide2 $STORAGE_POOL:cloudinit
qm set $VM_ID --boot c --bootdisk scsi0
qm set $VM_ID --serial0 socket --vga serial0
qm template $VM_ID
rm focal-server-cloudimg-amd64.img
```

View File

@@ -1,24 +0,0 @@
terraform {
required_version = ">= 0.13.0"
required_providers {
proxmox = {
source = "Telmate/proxmox"
version = "2.9.0"
}
sops = {
source = "carlpett/sops"
version = "0.6.3"
}
}
}
provider "proxmox" {
pm_tls_insecure = true
pm_api_url = "https://10.75.30.20:8006/api2/json"
pm_user = "root@pam"
pm_parallel = 4
}
provider "sops" {}

View File

@@ -1,47 +0,0 @@
resource "proxmox_vm_qemu" "kube-master" {
for_each = var.masters
name = each.key
target_node = each.value.target_node
agent = 1
clone = var.common.clone
vmid = each.value.id
memory = each.value.memory
cores = each.value.cores
vga {
type = "qxl"
}
network {
model = "virtio"
macaddr = each.value.macaddr
bridge = "vmbr0"
tag = 40
firewall = true
}
network {
model = "virtio"
bridge = "vmbr1"
}
disk {
type = "scsi"
storage = "fast-pool"
size = each.value.disk
format = "raw"
ssd = 1
discard = "on"
}
serial {
id = 0
type = "socket"
}
bootdisk = "scsi0"
scsihw = "virtio-scsi-pci"
os_type = "cloud-init"
ipconfig0 = "ip=${each.value.cidr},gw=${each.value.gw}"
ipconfig1 = "ip=${each.value.ceph_cidr}"
ciuser = "ubuntu"
cipassword = data.sops_file.secrets.data["k8s.user_password"]
searchdomain = var.common.search_domain
nameserver = var.common.nameserver
sshkeys = data.sops_file.secrets.data["k8s.ssh_key"]
}

View File

View File

@@ -1,3 +0,0 @@
data "sops_file" "secrets" {
source_file = ".secrets.yaml"
}

View File

@@ -1,99 +0,0 @@
variable "common" {
type = map(string)
default = {
os_type = "ubuntu"
clone = "ubuntu-20.04-cloudimg"
search_domain = "dfw.56k.sh 56k.sh"
nameserver = "10.75.0.1"
}
}
variable "masters" {
type = map(map(string))
default = {
k8s-master01 = {
id = 4010
cidr = "10.75.40.10/24"
ceph_cidr = "10.75.33.40/24"
cores = 8
gw = "10.75.40.1"
macaddr = "02:DE:4D:48:28:01"
memory = 8192
disk = "40G"
target_node = "pve01"
},
k8s-master02 = {
id = 4011
cidr = "10.75.40.11/24"
ceph_cidr = "10.75.33.41/24"
cores = 8
gw = "10.75.40.1"
macaddr = "02:DE:4D:48:28:02"
memory = 8192
disk = "40G"
target_node = "pve02"
},
k8s-master03 = {
id = 4012
cidr = "10.75.40.12/24"
ceph_cidr = "10.75.33.42/24"
cores = 8
gw = "10.75.40.1"
macaddr = "02:DE:4D:48:28:03"
memory = 8192
disk = "40G"
target_node = "pve03"
}
}
}
variable "workers" {
type = map(map(string))
default = {
k8s-worker01 = {
id = 4020
cidr = "10.75.40.20/24"
ceph_cidr = "10.75.33.50/24"
cores = 16
gw = "10.75.40.1"
macaddr = "02:DE:4D:48:28:0A"
memory = 16384
disk = "40G"
target_node = "pve01"
},
k8s-worker02 = {
id = 4021
cidr = "10.75.40.21/24"
ceph_cidr = "10.75.33.51/24"
cores = 16
gw = "10.75.40.1"
macaddr = "02:DE:4D:48:28:0B"
memory = 16384
disk = "40G"
target_node = "pve02"
},
k8s-worker03 = {
id = 4022
cidr = "10.75.40.22/24"
ceph_cidr = "10.75.33.52/24"
cores = 16
gw = "10.75.40.1"
macaddr = "02:DE:4D:48:28:0C"
memory = 16384
disk = "40G"
target_node = "pve03"
},
k8s-worker04 = {
id = 4023
cidr = "10.75.40.23/24"
ceph_cidr = "10.75.33.53/24"
cores = 16
gw = "10.75.40.1"
macaddr = "02:DE:4D:48:28:0D"
memory = 16384
disk = "40G"
target_node = "pve04"
},
}
}

View File

@@ -1,47 +0,0 @@
resource "proxmox_vm_qemu" "kube-worker" {
for_each = var.workers
name = each.key
target_node = each.value.target_node
agent = 1
clone = var.common.clone
vmid = each.value.id
memory = each.value.memory
cores = each.value.cores
vga {
type = "qxl"
}
network {
model = "virtio"
macaddr = each.value.macaddr
bridge = "vmbr0"
tag = 40
firewall = true
}
network {
model = "virtio"
bridge = "vmbr1"
}
disk {
type = "scsi"
storage = "rust-pool"
size = each.value.disk
format = "raw"
ssd = 1
discard = "on"
}
serial {
id = 0
type = "socket"
}
bootdisk = "scsi0"
scsihw = "virtio-scsi-pci"
os_type = "cloud-init"
ipconfig0 = "ip=${each.value.cidr},gw=${each.value.gw}"
ipconfig1 = "ip=${each.value.ceph_cidr}"
ciuser = "ubuntu"
cipassword = data.sops_file.secrets.data["k8s.user_password"]
searchdomain = var.common.search_domain
nameserver = var.common.nameserver
sshkeys = data.sops_file.secrets.data["k8s.ssh_key"]
}