* Several v6 SKU types come with ephemeral OS disks with Nvme so
you get faster local storage and avoid managed disk costs
* Ensure worker_disk_size is set to the appropriate size for the
SKU's ephemeral storage, since you pay for it either way
* Requires https://github.com/hashicorp/terraform-provider-azurerm/pull/30044
* Set a rolling upgrade policy so that changes to the worker node
pool are rolled out gradually. Previously, the VMSS model could
change, but instances would not receive it until manually replaced
* Align Azure node pool behaviors more closely with AWS and GCP:
* On AWS, worker instance template changes trigger an instance refresh
* On GCP, worker instance template changes roll out via proactive
* Define Azure automatic instance repair using Application Health
Extension probes to 10256 (kube-proxy or Cilium equivalent) to match
the strategy used on Google Cloud
* Azure Load Balancers charge by load balancer rues (5 included)
so its useful to provide ways to stay under that number, either
by dropping support for port 80 traffic or IPv6 traffic. When
using global proxies, you can usually serve IPv6 or http->https
redirects separately anyway
* Using spot instances, when an instance is deleted it actually
lowers the desired number of nodes in the VMSS so the node is
not replaced
* Restore the auto-scale setting needed to maintain a consistent
desired number of workers while spot instances come and go. This
was mistakely removed in refactoring
* Azure Load Balancers include 5 rules (3 LB rules, 2 outbound) whether used or not
* [#1468](https://github.com/poseidon/typhoon/pull/1468) added 3 LB rules to support IPv6 load balancing,
raising the rules count from 5 to 8 and added ~$21/mo to the cost of the load balancer. If you use an edge
(e.g. Cloudflare) a cluster does not need to load balance IPv6, so this additional cost can be avoided
* I noticed this because my load balancing costs were up for the last
few months. The gotcha is that outbound rules count toward the 5 rules
included with the base cost of the LB (~$18/mo)
Docs: https://azure.microsoft.com/en-us/pricing/details/load-balancer/
* flannel and Cilium default to UDP 8472 for VXLAN traffic to
avoid conflicts with other VXLAN usage (e.g. Open vSwith)
* Aligning flannel and Cilium to use the same vxlan port makes
firewall rules or security policies simpler across clouds
Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/403
* Cilium has been the default for about 3 years and is the defacto
standard CNI choice. flannel is supported as a simple alternative
* Remove various historical options that were needed that are
specific to Calico
* By default, Kubelet will pull container images one by one
(in series), which is mostly related to Docker-era bugs in
parallel image pulls. These days we use containerd so parallel
pulls should be fine
* Serial image pulls are undesirable because one slow registry
or image can cause other image pulls to wait. Parallel image
pulls ensure only large images / slow registries see that impact
Docs: https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/
* Change the default Pod CIDR from 10.2.0.0/16 to 10.20.0.0/14
(10.20.0.0 - 10.23.255.255) to support 1024 nodes by default
* Most CNI providers divide the Pod CIDR so that each node has
a /24 to allocate to local pods (256). The previous `10.2.0.0/16`
default only fits 256 /24's so 256 nodes were supported without
customizing the pod_cidr
* When using the Cilium component, disable bootstrapping the
kube-proxy DaemonSet. Instead, configure Cilium to provide its
kube-proxy replacement with BPF
* Update the self-managed Cilium component to use kube-proxy
replacement as well
* Set reasonable values and remove some variable clutter
* enable_reporting is only used with Calico and we can just default
to false, I doubt anyone uses Calico and cares much about reporting
metrics to upstream Calico
* Drop support for `cluster_domain_suffix` customization and
always use `cluster.local`. Many components in the Kubernetes
ecosystem assume this default suffix and its very rare to be
setting a special value here these days
* Cleanup a few variables that are seldom used
* On platforms that support ARM64 instances, configure controller
and worker node host architectures separately
* For example, you can run arm64 controllers and amd64 workers
* Add `controller_arch` and `worker_arch` variables
* Remove `arch` variable
* Use flexible orchestration mode. Azure has started to recommend this
mode because it allows interacting with VMSS instances like regular VMs
via the CLI or via the Azure Portal
* Add options to allow workers nodes to use ephemeral local disks
* Add `controller_disk_type` and `controller_disk_size` variables
* Add `worker_disk_type`, `worker_disk_size`, and `worker_ephemeral_disk` variables
* Consolidate load balancer frontend IPs to just the minimal IPv4
and IPv6 addresses that are needed per load balancer. apiserver and
ingress use separate ports, so there is not a true need for a separate
public IPv4 address just for apiserver
* Some might prefer a separate IP just because it slightly hides the
apiserver, but these are public hosted endpoints that can be discovered
* Reduce the cost of an Azure cluster since IPv4 public IPs are billed
($3.60/mo/cluster)
* Rename the region variable to location to align with Azure
platform conventions, where resources are created within an
Azure location, which are themselves part of broader geographical
regions
* Define a dual-stack virtual network with both IPv4 and IPv6 private
address space. Change `host_cidr` variable (string) to a `network_cidr`
variable (object) with "ipv4" and "ipv6" fields that list CIDR strings.
* Define dual-stack controller and worker subnets. Disable Azure
default outbound access (a deprecated fallback mechanism)
* Enable dual-stack load balancing to Kubernetes Ingress by adding
a public IPv6 frontend IP and LB rule to the load balancer.
* Enable worker outbound IPv6 connectivity through load balancer
SNAT by adding an IPv6 frontend IP and outbound rule
* Configure controller nodes with a public IPv6 address to provide
direct outbound IPv6 connectivity
* Add an IPv6 worker backend pool. Azure requires separate IPv4 and
IPv6 backend pools, though the health probe can be shared
* Extend network security group rules for IPv6 source/destinations
Checklist:
Access to controller and worker nodes via IPv6 addresses:
* SSH access to controller nodes via public IPv6 address
* SSH access to worker nodes via (private) IPv6 address (via
controller)
Outbound IPv6 connectivity from controller and worker nodes:
```
nc -6 -zv ipv6.google.com 80
Ncat: Version 7.94 ( https://nmap.org/ncat )
Ncat: Connected to [2607:f8b0:4001:c16::66]:80.
Ncat: 0 bytes sent, 0 bytes received in 0.02 seconds.
```
Serve Ingress traffic via IPv4 or IPv6 just requires setting
up A and AAAA records and running the ingress controller with
`hostNetwork: true` since, hostPort only forwards IPv4 traffic
* Previously: Typhoon provisions clusters with kube-system components
like CoreDNS, kube-proxy, and a chosen CNI provider (among flannel,
Calico, or Cilium) pre-installed. This is convenient since clusters
come with "batteries included". But it also means upgrading these
components is generally done in lock-step, by upgrading to a new
Typhoon / Kubernetes release
* It can be valuable to manage these components with a separate
plan/apply process or through automations and deploy systems. For
example, this allows managing CoreDNS separately from the cluster's
lifecycle.
* These "components" will continue to be pre-installed by default,
but a new `components` variable allows them to be disabled and
managed as "addons", components you apply after cluster creation
and manage on a rolling basis. For some of these, we may provide
Terraform modules to aide in managing these components.
```
module "cluster" {
# defaults
components = {
enable = true
coredns = {
enable = true
}
kube_proxy = {
enable = true
}
# Only the CNI set in var.networking will be installed
flannel = {
enable = true
}
calico = {
enable = true
}
cilium = {
enable = true
}
}
}
```
An earlier variable `install_container_networking = true/false` has
been removed, since it can now be achieved with this more extensible
and general components mechanism by setting the chosen networking
provider enable field to false.
* Output the network security group name and address prefixes
for controller nodes, to allow adding custom network security
rules that apply specifically to controller nodes
* Add firewall or security riles to allow node-to-node traffic
on ports 9962-9965 for Cilium and Hubble metrics. Cilium runs
with host network, so these require cloud firewall changes