* Migration from a self-hosted to a static pod control plane dropped
a few kube-controller-manager customizations
* Reduce kube-controller-manager --pod-eviction-timeout from 5m to 1m
to move pods more quickly when nodes are preempted
* Fix flex-volume-plugin-dir since the Kubernetes default points to
a read-only filesystem on Container Linux / Fedora CoreOS
Related:
* https://github.com/poseidon/terraform-render-bootstrap/pull/148
* 7b06557b7a
* Reduce kube-apiserver and kube-controller-manager CPU
requests from 200m to 150m. Prefer slightly lower commitment
after running with the requests chosen in #161 for a while
* Reduce calico-node CPU request from 150m to 100m to match
CoreDNS and flannel
* Annotate terraform output variables containing generated TLS
credentials and kubeconfigs as sensitive to suppress / mask
them in terraform CLI display.
* Allow for easier use in automation systems and logged environments
* `asset_dir` is an absolute path to a directory where generated
assets from terraform-render-bootstrap are written (sensitive)
* Change `asset_dir` to default to "" so no assets are written
(favor Terraform output mechanisms). Previously, asset_dir was
required so all users set some path. To take advantage of the
new optionality, remove asset_dir or set it to ""
* Introduce a new `assets_dist` output variable that provides
a mapping from suggested asset paths to asset contents (for
assets that should be distributed to controller nodes). This
new output format is intended to align with a modified asset
distribution style in Typhoon.
* Lay the groundwork for `assets_dir` to become optional. The
output map provides output variable access to the minimal assets
that are required for bootstrap
* Assets that aren't required for bootstrap itself (e.g.
the etcd CA key) but can be used by admins may later be added
as specific output variables to further reduce asset_dir use
Background:
* `terraform-render-bootstrap` rendered assets were previously
only provided by rendering files to an `asset_dir`. This was
neccessary, but created a responsibility to maintain those
assets on the machine where terraform apply was run
* Remove unused `ca_cert`, `kubelet_cert`, `kubelet_key`,
and `server` outputs
* These outputs were once needed to support clusters with
managed instance groups, but that hasn't been the case for
quite some time
* Set small CPU requests on static pods kube-apiserver,
kube-controller-manager, and kube-scheduler to align with
upstream tooling and for edge cases
* Control plane nodes are tainted to isolate them from
ordinary workloads. Even dense workloads can only compress
CPU resources on worker nodes.
* Control plane static pods use the highest priority class, so
contention favors control plane pods (over say node-exporter)
and CPU is compressible too.
* Effectively, a practical case for these requests hasn't been
observed. However, a small static pod CPU request may offer
a slight benefit if a controller became overloaded and the
above mechanisms were insufficient for some reason (bit of a
stretch, due to CPU compressibility)
* Continue to avoid setting a memory request for static pods.
It would impose a hard size requirement on controller nodes,
which isn't warranted and is handled more gently by Typhoon
default instance types across clouds and via docs
* Adopt Terrform v0.12 type and templatefile function
features to replace the use of terraform-provider-template's
`template_dir`
* Use of `for_each` to write local assets requires
that consumers use Terraform v0.12.6+ (action required)
* Continue use of `template_file` as its quite common. In
future, we may replace it as well.
* Remove outputs `id` and `content_hash` (no longer used)
Background:
* `template_dir` was added to `terraform-provider-template`
to add support for template directory rendering in CoreOS
Tectonic Kubernetes distribution (~2017)
* Terraform v0.12 introduced a native `templatefile` function
and v0.12.6 introduced native `for_each` support (July 2019)
that makes it possible to replace `template_dir` usage
* Change calico-node livenessProve from httpGet to exec
a calico-node -felix-ready, as recommended by Calico
* Allow advertising Kubernetes service ClusterIPs
* Kubernetes v1.11 considered kube-proxy IPVS mode GA
* Many problems were found https://github.com/poseidon/typhoon/pull/321
* Since then, major blockers seem to have been addressed
* Rename from terraform-render-bootkube to terraform-render-bootstrap
* Generated manifest and certificate assets are no longer geared
specifically for bootkube (no longer used)
* Rename the organization in generated CA certificates for
clusters from bootkube to typhoon
* Mainly helpful to avoid confusion with bootkube CA certificates
if users inspect their CA, especially now that bootkube isn't used
(better their searches lead to Typhoon)
* Run kube-apiserver, kube-scheduler, and kube-controller-manager
as static pods on each controller node
* Boostrap a minimal control plane by copying `static-manifests`
to the Kubelet `--pod-manifest-path` and tls/auth secrets to
`/etc/kubernetes/bootstrap-secrets`. Then, kubectl apply Kubernetes
manifests.
* Discontinue using bootkube to bootstrap and pivot to a self-hosted
control plane.
* Remove bootkube self-hosted kube-apiserver DaemonSet and
kube-scheduler and kube-controller-manager Deployments
* Remove pod-checkpointer manifests (no longer needed)
Advantages:
* Reduce control plane bootstrapping complexity. Self-hosted pivot and
pod checkpointing worked well, but in-place edits to kube-apiserver,
kube-controller-manager, or kube-scheduler is infrequently used. The
concept was originally geared toward continuously in-place upgrading
clusters, a goal Typhoon doesn't take on (rec. blue/green clusters).
As such, the value-add isn't justifying the extra components for this
particular project.
* Static pods still provide kubectl visibility and log access
Drawbacks:
* In-place edits to kube-apiserver, kube-controller-manager, and
kube-scheduler are not possible via kubectl (non-goal)
* Assets must be copied to each controller (not just one)
* Static pod must load credentials via hostPath, which is less clean
compared with the former Kubernetes secrets and service accounts
* Require bootstrap-kube-apiserver and kube-apiserver components
listen on port 6443 (internally) to allow kube-apiserver pods to
run with lower user privilege
* Remove variable `apiserver_port`. The kube-apiserver listen
port is no longer customizable.
* Add variable `external_apiserver_port` to allow architectures
where a load balancer fronts kube-apiserver 6443 backends, but
listens on a different port externally. For example, Google Cloud
TCP Proxy load balancers cannot listen on 6443
* Terraform v0.12 is a major Terraform release with breaking changes
to the HCL language. In v0.11, it was required to use redundant brackets
as interpreter type hints to pass lists or concat and flatten lists and
strings. In v0.12, that work-around is no longer supported. Lists are
represented as first-class objects and the redundant brackets create
nested lists. Consequently, its not possible to pass lists in a way that
works with both v0.11 and v0.12 at the same time. We've made the
difficult choice to pursue a hard cutover to Terraform v0.12.x
* https://www.terraform.io/upgrade-guides/0-12.html#referring-to-list-variables
* Use expression syntax instead of interpolated strings, where suggested
* Define Terraform required_version ~> v0.12.0 (> v0.12, < v0.13)
* Add `ready` plugin and change the readinessProbe to check
default port 8181 to ensure all plugins are ready
* `upstream [ADDRESS]` defines upstream resolvers for external
services. If no address is given, resolution is against CoreDNS
itself, which is the default. So `upstream` can be removed
* Change flannel port from the kernel default 8472 to the
IANA assigned VXLAN port 4789
* Requires a change to firewall rules or security groups
depending on the platform (**action required!**)
* Why now? Calico now offers its own VXLAN backend so
standardizing on the IANA port simplifies configuration
* https://github.com/coreos/flannel/blob/master/Documentation/backends.md#vxlan
* Accept a `network_encapsulation` variable to choose whether the
default IPPool should use ipip (default) or vxlan encapsulation
* Use `network_mtu` as the MTU for workload interfaces for ipip
or vxlan (although Calico can have a IPPools with a mix, we're
picking ipip xor vxlan)