* Originally, generated TLS certificates, manifests, and
cluster "assets" written to local disk (`asset_dir`) during
terraform apply cluster bootstrap
* Typhoon v1.17.0 introduced bootstrapping using only Terraform
state to store cluster assets, to avoid ever writing sensitive
materials to disk and improve automated use-cases. `asset_dir`
was changed to optional and defaulted to "" (no writes)
* Typhoon v1.18.0 deprecated the `asset_dir` variable, removed
docs, and announced it would be deleted in future.
* Remove the `asset_dir` variable
Cluster assets are now stored in Terraform state only. For those
who wish to write those assets to local files, this is possible
doing so explicitly.
```
resource local_file "assets" {
for_each = module.bootstrap.assets_dist
filename = "some-assets/${each.key}"
content = each.value
}
```
Related:
* https://github.com/poseidon/typhoon/pull/595
* https://github.com/poseidon/typhoon/pull/678
* seccomp graduated to GA in Kubernetes v1.19. Support
for seccomp alpha annotations will be removed in v1.22
* Replace seccomp annotations with the GA seccompProfile
field in the PodTemplate securityContext
* Switch profile from `docker/default` to `runtime/default`
(no effective change, since docker is the runtime)
* Verify with docker inspect SecurityOpt. Without the
profile, you'd see `seccomp=unconfined`
Related:
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#seccomp-graduates-to-general-availability
* Allow Cilium operator Pods to leader elect when Deployment
has more than one replica
* Use topology spread constraint to keep multiple operators
from running on the same node (pods bind hostNetwork ports)
* Update CNI plugins from v0.6.0 to v0.8.6 to fix several CVEs
* Update the base image to alpine:3.12
* Use `flannel-cni` as an init container and remove sleep
* Add Linux ARM64 and multi-arch container images
* https://github.com/poseidon/flannel-cni
* https://quay.io/repository/poseidon/flannel-cni
Background
* Switch from github.com/coreos/flannel-cni v0.3.0 which was last
published by me in 2017 and which is no longer accessible to me
to maintain or patch
* Port to the poseidon/flannel-cni rewrite, which releases v0.4.0
to continue the prior release numbering
* Accept experimental CNI `networking` mode "cilium"
* Run Cilium v1.8.0 with overlay vxlan tunnels and a
minimal set of features. We're interested in:
* IPAM: Divide pod_cidr into /24 subnets per node
* CNI networking pod-to-pod, pod-to-external
* BPF masquerade
* NetworkPolicy as defined by Kubernetes (no L7)
* Continue using kube-proxy with Cilium probe mode
* Firewall changes:
* Require UDP 8472 for vxlan (Linux kernel default) between nodes
* Optional ICMP echo(8) between nodes for host reachability (health)
* Optional TCP 4240 between nodes for host reachability (health)
* Use node label `node.kubernetes.io/controller` to select
controller nodes (action required)
* Tolerate node taint `node-role.kubernetes.io/controller`
for workloads that should run on controller nodes. Don't
tolerate `node-role.kubernetes.io/master` (action required)
* Add nodeAffinity to CoreDNS deployment PodSpec to
prefer running CoreDNS pods on controllers, while
relying on podAntiAffinity for spreading.
* For single master clusters, running two CoreDNS pods
on the master or running one pod on a worker is
permissible.
* Note: Its still _possible_ to end up with CoreDNS pods
all running on workers since we only express scheduling
preference ("soft"), but unlikely. Plus the motivating
scenario (below) is also rare.
Background:
* CoreDNS replicas are set to the higher of 2 or the
number of control plane nodes to (at a minimum) support
Deployment updates or pod restarts and match the cluster
size (e.g. 5 master/controller nodes likely means a
larger cluster, so run 5 CoreDNS replicas)
* In the past (before v1.14), we required kube-dns (CoreOS
predecessor) to run CoreDNS pods on master nodes. With
CoreDNS this node selection was relaxed. We'd like a
gentler form of it now.
Motivation:
* On clusters using 100% preemptible/spot workers, it is
possible that CoreDNS pods schedule to workers that are all
preempted at the same time, causing a loss of cluster internal
DNS service until a CoreDNS pod reschedules (1 min). We'd like
CoreDNS to prefer controller/master nodes (which aren't preempted)
to reduce the possibility of control plane disruption
* Set a consistent MCS level/range for Calico install-cni
* Note: Rebooting a node was a workaround, because Kubelet
relabels /etc/kubernetes(/cni/net.d)
Background:
* On SELinux enforcing systems, the Calico CNI install-cni
container ran with default SELinux context and a random MCS
pair. install-cni places CNI configs by first creating a
temporary file and then moving them into place, which means
the file MCS categories depend on the containers SELinux
context.
* calico-node Pod restarts creates a new install-cni container
with a different MCS pair that cannot access the earlier
written file (it places configs every time), causing the
init container to error and calico-node to crash loop
* https://github.com/projectcalico/cni-plugin/issues/874
```
mv: inter-device move failed: '/calico.conf.tmp' to
'/host/etc/cni/net.d/10-calico.conflist'; unable to remove target:
Permission denied
Failed to mv files. This may be caused by selinux configuration on the
host, or something else.
```
Note, this isn't a host SELinux configuration issue.
* Enable bootstrap token authentication on kube-apiserver
* Generate the bootstrap.kubernetes.io/token Secret that
may be used as a bootstrap token
* Generate a bootstrap kubeconfig (with a bootstrap token)
to be securely distributed to nodes. Each Kubelet will use
the bootstrap kubeconfig to authenticate to kube-apiserver
as `system:bootstrappers` and send a node-unique CSR for
kube-controller-manager to automatically approve to issue
a Kubelet certificate and kubeconfig (expires in 72 hours)
* Add ClusterRoleBinding for bootstrap token subjects
(`system:bootstrappers`) to have the `system:node-bootstrapper`
ClusterRole
* Add ClusterRoleBinding for bootstrap token subjects
(`system:bootstrappers`) to have the csr nodeclient ClusterRole
* Add ClusterRoleBinding for bootstrap token subjects
(`system:bootstrappers`) to have the csr selfnodeclient ClusterRole
* Enable NodeRestriction admission controller to limit the
scope of Node or Pod objects a Kubelet can modify to those of
the node itself
* Ability for a Kubelet to delete its Node object is retained
as preemptible nodes or those in auto-scaling instance groups
need to be able to remove themselves on shutdown. This need
continues to have precedence over any risk of a node deleting
itself maliciously
Security notes:
1. Issued Kubelet certificates authenticate as user `system:node:NAME`
and group `system:nodes` and are limited in their authorization
to perform API operations by Node authorization and NodeRestriction
admission. Previously, a Kubelet's authorization was broader. This
is the primary security motivation.
2. The bootstrap kubeconfig credential has the same sensitivity
as the previous generated TLS client-certificate kubeconfig.
It must be distributed securely to nodes. Its compromise still
allows an attacker to obtain a Kubelet kubeconfig
3. Bootstrapping Kubelet kubeconfig's with a limited lifetime offers
a slight security improvement.
* An attacker who obtains the kubeconfig can likely obtain the
bootstrap kubeconfig as well, to obtain the ability to renew
their access
* A compromised bootstrap kubeconfig could plausibly be handled
by replacing the bootstrap token Secret, distributing the token
to new nodes, and expiration. Whereas a compromised TLS-client
certificate kubeconfig can't be revoked (no CRL). However,
replacing a bootstrap token can be impractical in real cluster
environments, so the limited lifetime is mostly a theoretical
benefit.
* Cluster CSR objects are visible via kubectl which is nice
4. Bootstrapping node-unique Kubelet kubeconfigs means Kubelet
clients have more identity information, which can improve the
utility of audits and future features
Rel: https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet-tls-bootstrapping/