517 Commits

Author SHA1 Message Date
Dalton Hubble
fd10b94f87 Update Calico from v3.16.5 to v3.17.0
* Consider Calico's MTU auto-detection, but leave
Calico MTU variable for now (`network_mtu` ignored)
* Remove SELinux level setting workaround for
https://github.com/projectcalico/cni-plugin/issues/874
2020-11-25 11:18:59 -08:00
Dalton Hubble
49216ab82c Update Kubernetes from v1.19.3 to v1.19.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1194
2020-11-11 22:26:43 -08:00
Dalton Hubble
1c3d293f7c Update Cilium from v1.9.0-rc3 to v1.9.0
* https://github.com/cilium/cilium/releases/tag/v1.9.0
* https://github.com/cilium/cilium/pull/13937
2020-11-10 23:15:30 -08:00
Dalton Hubble
ef17534c33 Update Calico from v3.16.4 to v3.16.5
* https://docs.projectcalico.org/v3.16/release-notes/
2020-11-10 18:27:20 -08:00
Starbuck
74c299bf2c Restore kube-controller-manager --use-service-account-credentials
* kube-controller-manager Pods can start control loops with credentials
that have been granted relevant controller manager roles or using
generated service accounts bound to each role
* During the migration of the control plane from self-hosted to static
pods (https://github.com/poseidon/terraform-render-bootstrap/pull/148)
the flag for using separate service accounts was inadvertently dropped
* Restore the --use-service-account-credentials flag used before v1.16

Related:

* https://kubernetes.io/docs/reference/access-authn-authz/rbac/#controller-roles
* https://github.com/poseidon/terraform-render-bootstrap/pull/225
2020-11-10 12:06:51 -08:00
Dalton Hubble
c6e3a2bcdc Update Cilium from v1.8.5 to v1.9.0-rc3
* https://github.com/cilium/cilium/releases/tag/v1.9.0-rc3
* https://github.com/cilium/cilium/releases/tag/v1.9.0-rc2
* https://github.com/cilium/cilium/releases/tag/v1.9.0-rc1
2020-11-03 00:05:32 -08:00
Dalton Hubble
3a0feda171 Update Cilium from v1.8.4 to v1.8.5
* https://github.com/cilium/cilium/releases/tag/v1.8.5
2020-10-29 00:48:39 -07:00
Dalton Hubble
7036f64891 Update Calico from v3.16.3 to v3.16.4
* https://docs.projectcalico.org/v3.16/release-notes/
2020-10-25 11:50:43 -07:00
Dalton Hubble
9037d7311b Remove asset_dir variable and optional asset writes
* Originally, generated TLS certificates, manifests, and
cluster "assets" written to local disk (`asset_dir`) during
terraform apply cluster bootstrap
* Typhoon v1.17.0 introduced bootstrapping using only Terraform
state to store cluster assets, to avoid ever writing sensitive
materials to disk and improve automated use-cases. `asset_dir`
was changed to optional and defaulted to "" (no writes)
* Typhoon v1.18.0 deprecated the `asset_dir` variable, removed
docs, and announced it would be deleted in future.
* Remove the `asset_dir` variable

Cluster assets are now stored in Terraform state only. For those
who wish to write those assets to local files, this is possible
doing so explicitly.

```
resource local_file "assets" {
  for_each = module.bootstrap.assets_dist
  filename = "some-assets/${each.key}"
  content = each.value
}
```

Related:

* https://github.com/poseidon/typhoon/pull/595
* https://github.com/poseidon/typhoon/pull/678
2020-10-17 14:57:13 -07:00
Maikel
84f897b5f1 Add Cilium manifests to local_file asset_dir (#221)
* Note, asset_dir is deprecated https://github.com/poseidon/typhoon/pull/678
2020-10-17 14:30:50 -07:00
Dalton Hubble
7988fb7159 Update Calico from v3.15.3 to v3.16.3
* https://github.com/projectcalico/calico/releases/tag/v3.16.3
* https://docs.projectcalico.org/v3.16/release-notes/
2020-10-15 20:00:41 -07:00
Dalton Hubble
5bebcc5f00 Update Kubernetes from v1.19.2 to v1.19.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1193
2020-10-14 20:42:07 -07:00
Dalton Hubble
4448143f64 Update flannel from v0.13.0-rc2 to v0.13.0
* https://github.com/coreos/flannel/releases/tag/v0.13.0
2020-10-14 20:29:20 -07:00
Dalton Hubble
a2eb1dcbcf Update Cilium from v1.8.3 to v1.8.4
* https://github.com/cilium/cilium/releases/tag/v1.8.4
2020-10-02 00:20:19 -07:00
Dalton Hubble
d0f2123c59 Update Kubernetes from v1.19.1 to v1.19.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1192
2020-09-16 20:03:44 -07:00
Dalton Hubble
9315350f55 Update flannel and flannel-cni image versions
* Update flannel from v0.12.0 to v0.13.0-rc2
* Use new flannel multi-arch image
* Update flannel-cni to update CNI plugins from v0.8.6 to
v0.8.7
2020-09-16 20:02:42 -07:00
Nesc58
016d4ebd0c Mount /run/xtables.lock in flannel Daemonset
* Mount xtables.lock (like Calico and Cilium) since iptables
may be called by other processes (kube-proxy)
2020-09-16 19:01:42 -07:00
Dalton Hubble
f2dd897d67 Change seccomp annotations to Pod seccompProfile
* seccomp graduated to GA in Kubernetes v1.19. Support
for seccomp alpha annotations will be removed in v1.22
* Replace seccomp annotations with the GA seccompProfile
field in the PodTemplate securityContext
* Switch profile from `docker/default` to `runtime/default`
(no effective change, since docker is the runtime)
* Verify with docker inspect SecurityOpt. Without the
profile, you'd see `seccomp=unconfined`

Related:
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#seccomp-graduates-to-general-availability
2020-09-10 00:28:58 -07:00
Dalton Hubble
c72826908b Update Kubernetes from v1.19.0 to v1.19.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1191
2020-09-09 20:42:43 -07:00
Dalton Hubble
81ac7e6e2f Update Calico from v3.15.2 to v3.15.3
* https://github.com/projectcalico/calico/releases/tag/v3.15.3
2020-09-09 20:40:41 -07:00
Dalton Hubble
9ce9148557 Update Cilium from v1.8.2 to v1.8.3
* https://github.com/cilium/cilium/releases/tag/v1.8.3
2020-09-07 17:54:56 -07:00
Dalton Hubble
2686d59203 Allow leader election among Cilium operator replicas
* Allow Cilium operator Pods to leader elect when Deployment
has more than one replica
* Use topology spread constraint to keep multiple operators
from running on the same node (pods bind hostNetwork ports)
2020-09-07 17:48:19 -07:00
Dalton Hubble
79343f02ae Update Kubernetes from v1.18.8 to v1.19.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md/#v1190
2020-08-26 19:31:20 -07:00
Dalton Hubble
91738c35ff Update Calico from v3.15.1 to v3.15.2
* https://docs.projectcalico.org/release-notes/
2020-08-26 19:30:37 -07:00
Dalton Hubble
8ef2fe7c99 Update Kubernetes from v1.18.6 to v1.18.8
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1188
2020-08-13 20:43:42 -07:00
Dalton Hubble
60540868e0 Relax Terraform version constraint
* bootstrap uses only Hashicorp Terraform modules, allow
Terraform v0.13.x usage
2020-08-10 21:21:54 -07:00
Dalton Hubble
3675b3a539 Update from coreos/flannel-cni to poseidon/flannel-cni
* Update CNI plugins from v0.6.0 to v0.8.6 to fix several CVEs
* Update the base image to alpine:3.12
* Use `flannel-cni` as an init container and remove sleep
* Add Linux ARM64 and multi-arch container images
* https://github.com/poseidon/flannel-cni
* https://quay.io/repository/poseidon/flannel-cni

Background

* Switch from github.com/coreos/flannel-cni v0.3.0 which was last
published by me in 2017 and which is no longer accessible to me
to maintain or patch
* Port to the poseidon/flannel-cni rewrite, which releases v0.4.0
to continue the prior release numbering
2020-08-02 15:06:18 -07:00
Dalton Hubble
45053a62cb Update Cilium from v1.8.1 to v1.8.2
* Drop unused option https://github.com/cilium/cilium/pull/12618
2020-07-25 15:52:19 -07:00
Dalton Hubble
9de4267c28 Update CoreDNS from v1.6.7 to v1.7.0
* https://coredns.io/2020/06/15/coredns-1.7.0-release/
2020-07-25 13:08:29 -07:00
Dalton Hubble
835890025b Update Calico from v3.15.0 to v3.15.1
* https://docs.projectcalico.org/v3.15/release-notes/
2020-07-15 22:03:54 -07:00
Dalton Hubble
2bab6334ad Update Kubernetes from v1.18.5 to v1.18.6
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1186
2020-07-15 21:55:02 -07:00
Dalton Hubble
9a5132b2ad Update Cilium from v1.8.0 to v1.8.1
* https://github.com/cilium/cilium/releases/tag/v1.8.1
2020-07-05 15:58:53 -07:00
Dalton Hubble
5a7c963caf Update Kubernetes from v1.18.4 to v1.18.5
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1185
2020-06-27 13:49:10 -07:00
Dalton Hubble
5043456b05 Update Calico from v3.14.1 to v3.15.0
* https://docs.projectcalico.org/v3.15/release-notes/
2020-06-26 02:39:01 -07:00
Dalton Hubble
c014b77090 Update Cilium from v1.8.0-rc4 to v1.8.0
* https://github.com/cilium/cilium/releases/tag/v1.8.0
2020-06-22 22:25:38 -07:00
Dalton Hubble
1c07dfbc2a Remove experimental kube-router CNI provider 2020-06-21 21:55:56 -07:00
Dalton Hubble
af36c53936 Add experimental Cilium CNI provider
* Accept experimental CNI `networking` mode "cilium"
* Run Cilium v1.8.0 with overlay vxlan tunnels and a
minimal set of features. We're interested in:
  * IPAM: Divide pod_cidr into /24 subnets per node
  * CNI networking pod-to-pod, pod-to-external
  * BPF masquerade
  * NetworkPolicy as defined by Kubernetes (no L7)
* Continue using kube-proxy with Cilium probe mode
* Firewall changes:
  * Require UDP 8472 for vxlan (Linux kernel default) between nodes
  * Optional ICMP echo(8) between nodes for host reachability (health)
  * Optional TCP 4240 between nodes for host reachability (health)
2020-06-21 16:21:09 -07:00
Dalton Hubble
e75697ce35 Rename controller node label and NoSchedule taint
* Use node label `node.kubernetes.io/controller` to select
controller nodes (action required)
* Tolerate node taint `node-role.kubernetes.io/controller`
for workloads that should run on controller nodes. Don't
tolerate `node-role.kubernetes.io/master` (action required)
2020-06-17 22:46:35 -07:00
Dalton Hubble
3fe903d0ac Update Kubernetes from v1.18.3 to v1.18.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1184
* Remove unused template file
2020-06-17 19:23:12 -07:00
Dalton Hubble
fc1a7bac89 Remove unused Kubelet certificate and key pair
* Kubelet certificate and key pair in state (not distributed)
are not needed after with Kubelet TLS bootstrap
* https://github.com/poseidon/terraform-render-bootstrap/pull/185

Fix https://github.com/poseidon/typhoon/issues/757
2020-06-11 21:20:41 -07:00
Dalton Hubble
c3b1f23b5d Update Calico from v3.14.0 to v3.14.1
* https://docs.projectcalico.org/v3.14/release-notes/
2020-05-29 00:35:16 -07:00
Dalton Hubble
ff7ec52d0a Update Kubernetes from v1.18.2 to v1.18.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md
2020-05-20 20:34:42 -07:00
Dalton Hubble
a83ddbb30e Add CoreDNS "soft" nodeAffinity for controller nodes
* Add nodeAffinity to CoreDNS deployment PodSpec to
prefer running CoreDNS pods on controllers, while
relying on podAntiAffinity for spreading.
* For single master clusters, running two CoreDNS pods
on the master or running one pod on a worker is
permissible.
* Note: Its still _possible_ to end up with CoreDNS pods
all running on workers since we only express scheduling
preference ("soft"), but unlikely. Plus the motivating
scenario (below) is also rare.

Background:

* CoreDNS replicas are set to the higher of 2 or the
number of control plane nodes to (at a minimum) support
Deployment updates or pod restarts and match the cluster
size (e.g. 5 master/controller nodes likely means a
larger cluster, so run 5 CoreDNS replicas)
* In the past (before v1.14), we required kube-dns (CoreOS
predecessor) to run CoreDNS pods on master nodes. With
CoreDNS this node selection was relaxed. We'd like a
gentler form of it now.

Motivation:

* On clusters using 100% preemptible/spot workers, it is
possible that CoreDNS pods schedule to workers that are all
preempted at the same time, causing a loss of cluster internal
DNS service until a CoreDNS pod reschedules (1 min). We'd like
CoreDNS to prefer controller/master nodes (which aren't preempted)
to reduce the possibility of control plane disruption
2020-05-09 22:48:56 -07:00
Dalton Hubble
157336db92 Update Calico from v3.13.3 to v3.14.0
* https://docs.projectcalico.org/v3.14/release-notes/
2020-05-09 16:02:38 -07:00
Dalton Hubble
1dc36b58b8 Fix Calico node crash loop on Pod restart
* Set a consistent MCS level/range for Calico install-cni
* Note: Rebooting a node was a workaround, because Kubelet
relabels /etc/kubernetes(/cni/net.d)

Background:

* On SELinux enforcing systems, the Calico CNI install-cni
container ran with default SELinux context and a random MCS
pair. install-cni places CNI configs by first creating a
temporary file and then moving them into place, which means
the file MCS categories depend on the containers SELinux
context.
* calico-node Pod restarts creates a new install-cni container
with a different MCS pair that cannot access the earlier
written file (it places configs every time), causing the
init container to error and calico-node to crash loop
* https://github.com/projectcalico/cni-plugin/issues/874

```
mv: inter-device move failed: '/calico.conf.tmp' to
'/host/etc/cni/net.d/10-calico.conflist'; unable to remove target:
Permission denied
Failed to mv files. This may be caused by selinux configuration on the
host, or something else.
```

Note, this isn't a host SELinux configuration issue.
2020-05-09 15:20:06 -07:00
Dalton Hubble
924beb4b0c Enable Kubelet TLS bootstrap and NodeRestriction
* Enable bootstrap token authentication on kube-apiserver
* Generate the bootstrap.kubernetes.io/token Secret that
may be used as a bootstrap token
* Generate a bootstrap kubeconfig (with a bootstrap token)
to be securely distributed to nodes. Each Kubelet will use
the bootstrap kubeconfig to authenticate to kube-apiserver
as `system:bootstrappers` and send a node-unique CSR for
kube-controller-manager to automatically approve to issue
a Kubelet certificate and kubeconfig (expires in 72 hours)
* Add ClusterRoleBinding for bootstrap token subjects
(`system:bootstrappers`) to have the `system:node-bootstrapper`
ClusterRole
* Add ClusterRoleBinding for bootstrap token subjects
(`system:bootstrappers`) to have the csr nodeclient ClusterRole
* Add ClusterRoleBinding for bootstrap token subjects
(`system:bootstrappers`) to have the csr selfnodeclient ClusterRole
* Enable NodeRestriction admission controller to limit the
scope of Node or Pod objects a Kubelet can modify to those of
the node itself
* Ability for a Kubelet to delete its Node object is retained
as preemptible nodes or those in auto-scaling instance groups
need to be able to remove themselves on shutdown. This need
continues to have precedence over any risk of a node deleting
itself maliciously

Security notes:

1. Issued Kubelet certificates authenticate as user `system:node:NAME`
and group `system:nodes` and are limited in their authorization
to perform API operations by Node authorization and NodeRestriction
admission. Previously, a Kubelet's authorization was broader. This
is the primary security motivation.

2. The bootstrap kubeconfig credential has the same sensitivity
as the previous generated TLS client-certificate kubeconfig.
It must be distributed securely to nodes. Its compromise still
allows an attacker to obtain a Kubelet kubeconfig

3. Bootstrapping Kubelet kubeconfig's with a limited lifetime offers
a slight security improvement.
  * An attacker who obtains the kubeconfig can likely obtain the
  bootstrap kubeconfig as well, to obtain the ability to renew
  their access
  * A compromised bootstrap kubeconfig could plausibly be handled
  by replacing the bootstrap token Secret, distributing the token
  to new nodes, and expiration. Whereas a compromised TLS-client
  certificate kubeconfig can't be revoked (no CRL). However,
  replacing a bootstrap token can be impractical in real cluster
  environments, so the limited lifetime is mostly a theoretical
  benefit.
  * Cluster CSR objects are visible via kubectl which is nice

4. Bootstrapping node-unique Kubelet kubeconfigs means Kubelet
clients have more identity information, which can improve the
utility of audits and future features

Rel: https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet-tls-bootstrapping/
2020-04-25 19:38:56 -07:00
Dalton Hubble
c62c7f5a1a Update Calico from v3.13.1 to v3.13.3
* https://docs.projectcalico.org/v3.13/release-notes/
2020-04-22 20:26:32 -07:00
Dalton Hubble
14d0b20879 Update Kubernetes from v1.18.1 to v1.18.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#downloads-for-v1182
2020-04-16 23:33:42 -07:00
Dalton Hubble
1ad53d3b1c Update Kubernetes from v1.18.0 to v1.18.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md
2020-04-08 19:37:27 -07:00
Dalton Hubble
45dc2f5c0c Update flannel from v0.11.0 to v0.12.0
* https://github.com/coreos/flannel/releases/tag/v0.12.0
2020-03-31 18:23:57 -07:00