Commit Graph

159 Commits

Author SHA1 Message Date
Dalton Hubble
97fe45c93e Update Calico from v3.23.1 to v3.23.3
* https://github.com/projectcalico/calico/releases/tag/v3.23.3
2022-07-30 18:10:02 -07:00
Dalton Hubble
178664d84e Update Calico from v3.22.2 to v3.23.1
* https://github.com/projectcalico/calico/releases/tag/v3.23.1
2022-06-18 18:49:58 -07:00
Dalton Hubble
f325be5041 Update Cilium from v1.11.4 to v1.11.5
* https://github.com/cilium/cilium/releases/tag/v1.11.5
2022-05-31 15:21:36 +01:00
James Harmison
5bbca44f66 Update cilium ds name and label to align with upstream 2022-04-20 18:47:59 -07:00
Dalton Hubble
fa4745d155 Update Calico from v3.21.2 to v3.22.1
* Calico aims to fix https://github.com/projectcalico/calico/issues/5011
2022-03-11 10:57:07 -08:00
Dalton Hubble
37f45cb28b Update Cilium from v1.10.5 to v1.11.0
* https://github.com/cilium/cilium/releases/tag/v1.11.0
2021-12-10 11:23:56 -08:00
Dalton Hubble
8add7022d1 Normalize CA certs mounts in static Pods and kube-proxy
* Mount both /etc/ssl/certs and /etc/pki into control plane static
pods and kube-proxy, rather than choosing one based a variable
(set based on Flatcar Linux or Fedora CoreOS)
* Remove `trusted_certs_dir` variable
* Remove deprecated `--port` from `kube-scheduler` static Pod
2021-12-09 09:26:28 -08:00
Dalton Hubble
362158a6d6 Add missing caliconodestatuses CRD for Calico
* https://github.com/projectcalico/calico/pull/5012
2021-12-09 09:19:12 -08:00
Dalton Hubble
9f9d7708c3 Update Calico and flannel CNI providers
* Update Calico from v3.20.2 to v3.21.0
* Update flannel from v0.14.0 to v0.15.0
2021-11-11 14:25:11 -08:00
Dalton Hubble
0b102c4089 Update Calico from v3.20.1 to v3.20.2
* https://github.com/projectcalico/calico/releases/tag/v3.20.2
* Add support for iptables legacy vs nft detection
2021-10-05 19:33:09 -07:00
Dalton Hubble
c6fa09bda1 Update Calico and Cilium CNI providers
* Update Calico from v3.20.0 to v3.20.1
* Update Cilium from v1.10.3 to v1.10.4
* Remove Cilium wait for BGF mount
2021-09-21 09:11:49 -07:00
Dalton Hubble
bfc2fa9697 Fix ClusterIP access when using Cilium
* When a router sets node(s) as next-hops in a network,
ClusterIP Services should be able to respond as usual
* https://github.com/cilium/cilium/issues/14581
2021-09-15 19:43:58 -07:00
Dalton Hubble
a2e1cdfd8a Update Calico from v3.19.2 to v3.20.0
* https://github.com/projectcalico/calico/blob/v3.20.0/_includes/charts/calico/templates/calico-node.yaml
2021-08-18 19:43:40 -07:00
Dalton Hubble
b5f5d843ec Disable kube-scheduler insecure port
* Kubernetes v1.22.0 disables kube-controller-manager insecure
port which was used internally for Prometheus metrics scraping
In Typhoon, we'll switch to using the https port which requires
Prometheus present a bearer token
* Go ahead and disable the insecure port for kube-scheduler too,
we'll configure Prometheus to scrape it with a bearer token as
well
* Remove unused kube-apiserver `--port` flag

Rel:

* https://github.com/kubernetes/kubernetes/pull/96216
2021-08-10 21:11:30 -07:00
Dalton Hubble
5c0bebc1e7 Add Cilium init container to auto-mount cgroup2
* Add init container to auto-mount /sys/fs/cgroup cgroup2
at /run/cilium/cgroupv2 for the Cilium agent
* Enable CNI exclusive mode, to disable other configs
found in /etc/cni/net.d/
* https://github.com/cilium/cilium/pull/16259
2021-07-24 10:30:06 -07:00
Dalton Hubble
362f42a7a2 Update CoreDNS from v1.8.0 to v1.8.4
* https://coredns.io/2021/01/20/coredns-1.8.1-release/
* https://coredns.io/2021/02/23/coredns-1.8.2-release/
* https://coredns.io/2021/02/24/coredns-1.8.3-release/
* https://coredns.io/2021/05/28/coredns-1.8.4-release/
2021-06-23 23:26:27 -07:00
Dalton Hubble
7052c66882 Update Calico from v3.18.1 to v3.19.0
* https://docs.projectcalico.org/archive/v3.19/release-notes/
2021-05-13 11:17:48 -07:00
Dalton Hubble
a4ecf168df Update static Pod manifests for Kubernetes v1.21.0
* Set `kube-apiserver` `service-account-jwks-uri` because conformance
ServiceAccountIssuerDiscovery OIDC discovery will access a JWT endpoint
using the kube-apiserver's advertise address by default, instead of
using the intended in-cluster service (10.3.0.1) resolved by cluster DNS
`kubernetes.default.svc.cluster.local`, which causes a cert SAN error
* Set the authentication and authorization kubeconfig for kube-scheduler
and kube-controller-manager. Here, authn/z refer to aggregated API
use cases only, so its not strictly neccessary and warnings about
missing `extension-apiserver-authentication` when enable_aggregation
is false can be ignored
* Mount `/var/lib/kubelet/volumeplugins` to to the default location
expected within kube-controller-manager to remove the need for a flag
* Enable `tokencleaner` controller to automatically delete expired
bootstrap tokens (default node token is good 1 year, so cleanup won't
really matter at that point, but enable regardless)
* Remove unused `cloud-provider` flag, we never intend to use in-tree
cloud providers or support custom providers
2021-04-10 17:42:18 -07:00
Dalton Hubble
f87aa7f96a Change CNI config directory to /etc/cni/net.d
* Change CNI config directory from `/etc/kubernetes/cni/net.d`
to `/etc/cni/net.d` (Kubelet default)
2021-04-01 16:48:46 -07:00
Dalton Hubble
adcba1c211 Update Calico from v3.17.3 to v3.18.1
* https://docs.projectcalico.org/archive/v3.18/release-notes/
2021-03-14 10:15:39 -07:00
Dalton Hubble
84972373d4 Rename bootstrap-secrets directory to pki
* Change control plane static pods to mount `/etc/kubernetes/pki`,
instead of `/etc/kubernetes/bootstrap-secrets` to better reflect
their purpose and match some loose conventions upstream
* Require TLS assets to be placed at `/etc/kubernetes/pki`, instead
of `/etc/kubernetes/bootstrap-secrets` on hosts (breaking)
* Mount to `/etc/kubernetes/pki` to match the host (less surprise)
* https://kubernetes.io/docs/setup/best-practices/certificates/
2020-12-02 23:13:53 -08:00
Dalton Hubble
ac5cb95774 Generate kubeconfig's for kube-scheduler and kube-controller-manager
* Generate TLS client certificates for kube-scheduler and
kube-controller-manager with `system:kube-scheduler` and
`system:kube-controller-manager` CNs
* Template separate kubeconfigs for kube-scheduler and
kube-controller manager (`scheduler.conf` and
`controller-manager.conf`). Rename admin for clarity
* Before v1.16.0, Typhoon scheduled a self-hosted control
plane, which allowed the steady-state kube-scheduler and
kube-controller-manager to use a scoped ServiceAccount.
With a static pod control plane, separate CN TLS client
certificates are the nearest equiv.
* https://kubernetes.io/docs/setup/best-practices/certificates/
* Remove unused Kubelet certificate, TLS bootstrap is used
instead
2020-12-01 20:18:36 -08:00
Dalton Hubble
19c3ce61bd Add TokenReview and TokenRequestProjection kube-apiserver flags
* Add kube-apiserver flags for TokenReview and TokenRequestProjection
(beta, defaults on) to allow using Service Account Token Volume Projection
to create and mount service account tokens tied to a Pod's lifecycle
* Both features will be promoted from beta to stable in v1.20
* Rename `experimental-cluster-signing-duration` to just
`cluster-signing-duration`

Rel:

* https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#service-account-token-volume-projection
2020-12-01 19:50:25 -08:00
Dalton Hubble
fd10b94f87 Update Calico from v3.16.5 to v3.17.0
* Consider Calico's MTU auto-detection, but leave
Calico MTU variable for now (`network_mtu` ignored)
* Remove SELinux level setting workaround for
https://github.com/projectcalico/cni-plugin/issues/874
2020-11-25 11:18:59 -08:00
Starbuck
74c299bf2c Restore kube-controller-manager --use-service-account-credentials
* kube-controller-manager Pods can start control loops with credentials
that have been granted relevant controller manager roles or using
generated service accounts bound to each role
* During the migration of the control plane from self-hosted to static
pods (https://github.com/poseidon/terraform-render-bootstrap/pull/148)
the flag for using separate service accounts was inadvertently dropped
* Restore the --use-service-account-credentials flag used before v1.16

Related:

* https://kubernetes.io/docs/reference/access-authn-authz/rbac/#controller-roles
* https://github.com/poseidon/terraform-render-bootstrap/pull/225
2020-11-10 12:06:51 -08:00
Dalton Hubble
c6e3a2bcdc Update Cilium from v1.8.5 to v1.9.0-rc3
* https://github.com/cilium/cilium/releases/tag/v1.9.0-rc3
* https://github.com/cilium/cilium/releases/tag/v1.9.0-rc2
* https://github.com/cilium/cilium/releases/tag/v1.9.0-rc1
2020-11-03 00:05:32 -08:00
Dalton Hubble
7988fb7159 Update Calico from v3.15.3 to v3.16.3
* https://github.com/projectcalico/calico/releases/tag/v3.16.3
* https://docs.projectcalico.org/v3.16/release-notes/
2020-10-15 20:00:41 -07:00
Nesc58
016d4ebd0c Mount /run/xtables.lock in flannel Daemonset
* Mount xtables.lock (like Calico and Cilium) since iptables
may be called by other processes (kube-proxy)
2020-09-16 19:01:42 -07:00
Dalton Hubble
f2dd897d67 Change seccomp annotations to Pod seccompProfile
* seccomp graduated to GA in Kubernetes v1.19. Support
for seccomp alpha annotations will be removed in v1.22
* Replace seccomp annotations with the GA seccompProfile
field in the PodTemplate securityContext
* Switch profile from `docker/default` to `runtime/default`
(no effective change, since docker is the runtime)
* Verify with docker inspect SecurityOpt. Without the
profile, you'd see `seccomp=unconfined`

Related:
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#seccomp-graduates-to-general-availability
2020-09-10 00:28:58 -07:00
Dalton Hubble
2686d59203 Allow leader election among Cilium operator replicas
* Allow Cilium operator Pods to leader elect when Deployment
has more than one replica
* Use topology spread constraint to keep multiple operators
from running on the same node (pods bind hostNetwork ports)
2020-09-07 17:48:19 -07:00
Dalton Hubble
3675b3a539 Update from coreos/flannel-cni to poseidon/flannel-cni
* Update CNI plugins from v0.6.0 to v0.8.6 to fix several CVEs
* Update the base image to alpine:3.12
* Use `flannel-cni` as an init container and remove sleep
* Add Linux ARM64 and multi-arch container images
* https://github.com/poseidon/flannel-cni
* https://quay.io/repository/poseidon/flannel-cni

Background

* Switch from github.com/coreos/flannel-cni v0.3.0 which was last
published by me in 2017 and which is no longer accessible to me
to maintain or patch
* Port to the poseidon/flannel-cni rewrite, which releases v0.4.0
to continue the prior release numbering
2020-08-02 15:06:18 -07:00
Dalton Hubble
45053a62cb Update Cilium from v1.8.1 to v1.8.2
* Drop unused option https://github.com/cilium/cilium/pull/12618
2020-07-25 15:52:19 -07:00
Dalton Hubble
1c07dfbc2a Remove experimental kube-router CNI provider 2020-06-21 21:55:56 -07:00
Dalton Hubble
af36c53936 Add experimental Cilium CNI provider
* Accept experimental CNI `networking` mode "cilium"
* Run Cilium v1.8.0 with overlay vxlan tunnels and a
minimal set of features. We're interested in:
  * IPAM: Divide pod_cidr into /24 subnets per node
  * CNI networking pod-to-pod, pod-to-external
  * BPF masquerade
  * NetworkPolicy as defined by Kubernetes (no L7)
* Continue using kube-proxy with Cilium probe mode
* Firewall changes:
  * Require UDP 8472 for vxlan (Linux kernel default) between nodes
  * Optional ICMP echo(8) between nodes for host reachability (health)
  * Optional TCP 4240 between nodes for host reachability (health)
2020-06-21 16:21:09 -07:00
Dalton Hubble
e75697ce35 Rename controller node label and NoSchedule taint
* Use node label `node.kubernetes.io/controller` to select
controller nodes (action required)
* Tolerate node taint `node-role.kubernetes.io/controller`
for workloads that should run on controller nodes. Don't
tolerate `node-role.kubernetes.io/master` (action required)
2020-06-17 22:46:35 -07:00
Dalton Hubble
3fe903d0ac Update Kubernetes from v1.18.3 to v1.18.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1184
* Remove unused template file
2020-06-17 19:23:12 -07:00
Dalton Hubble
a83ddbb30e Add CoreDNS "soft" nodeAffinity for controller nodes
* Add nodeAffinity to CoreDNS deployment PodSpec to
prefer running CoreDNS pods on controllers, while
relying on podAntiAffinity for spreading.
* For single master clusters, running two CoreDNS pods
on the master or running one pod on a worker is
permissible.
* Note: Its still _possible_ to end up with CoreDNS pods
all running on workers since we only express scheduling
preference ("soft"), but unlikely. Plus the motivating
scenario (below) is also rare.

Background:

* CoreDNS replicas are set to the higher of 2 or the
number of control plane nodes to (at a minimum) support
Deployment updates or pod restarts and match the cluster
size (e.g. 5 master/controller nodes likely means a
larger cluster, so run 5 CoreDNS replicas)
* In the past (before v1.14), we required kube-dns (CoreOS
predecessor) to run CoreDNS pods on master nodes. With
CoreDNS this node selection was relaxed. We'd like a
gentler form of it now.

Motivation:

* On clusters using 100% preemptible/spot workers, it is
possible that CoreDNS pods schedule to workers that are all
preempted at the same time, causing a loss of cluster internal
DNS service until a CoreDNS pod reschedules (1 min). We'd like
CoreDNS to prefer controller/master nodes (which aren't preempted)
to reduce the possibility of control plane disruption
2020-05-09 22:48:56 -07:00
Dalton Hubble
1dc36b58b8 Fix Calico node crash loop on Pod restart
* Set a consistent MCS level/range for Calico install-cni
* Note: Rebooting a node was a workaround, because Kubelet
relabels /etc/kubernetes(/cni/net.d)

Background:

* On SELinux enforcing systems, the Calico CNI install-cni
container ran with default SELinux context and a random MCS
pair. install-cni places CNI configs by first creating a
temporary file and then moving them into place, which means
the file MCS categories depend on the containers SELinux
context.
* calico-node Pod restarts creates a new install-cni container
with a different MCS pair that cannot access the earlier
written file (it places configs every time), causing the
init container to error and calico-node to crash loop
* https://github.com/projectcalico/cni-plugin/issues/874

```
mv: inter-device move failed: '/calico.conf.tmp' to
'/host/etc/cni/net.d/10-calico.conflist'; unable to remove target:
Permission denied
Failed to mv files. This may be caused by selinux configuration on the
host, or something else.
```

Note, this isn't a host SELinux configuration issue.
2020-05-09 15:20:06 -07:00
Dalton Hubble
924beb4b0c Enable Kubelet TLS bootstrap and NodeRestriction
* Enable bootstrap token authentication on kube-apiserver
* Generate the bootstrap.kubernetes.io/token Secret that
may be used as a bootstrap token
* Generate a bootstrap kubeconfig (with a bootstrap token)
to be securely distributed to nodes. Each Kubelet will use
the bootstrap kubeconfig to authenticate to kube-apiserver
as `system:bootstrappers` and send a node-unique CSR for
kube-controller-manager to automatically approve to issue
a Kubelet certificate and kubeconfig (expires in 72 hours)
* Add ClusterRoleBinding for bootstrap token subjects
(`system:bootstrappers`) to have the `system:node-bootstrapper`
ClusterRole
* Add ClusterRoleBinding for bootstrap token subjects
(`system:bootstrappers`) to have the csr nodeclient ClusterRole
* Add ClusterRoleBinding for bootstrap token subjects
(`system:bootstrappers`) to have the csr selfnodeclient ClusterRole
* Enable NodeRestriction admission controller to limit the
scope of Node or Pod objects a Kubelet can modify to those of
the node itself
* Ability for a Kubelet to delete its Node object is retained
as preemptible nodes or those in auto-scaling instance groups
need to be able to remove themselves on shutdown. This need
continues to have precedence over any risk of a node deleting
itself maliciously

Security notes:

1. Issued Kubelet certificates authenticate as user `system:node:NAME`
and group `system:nodes` and are limited in their authorization
to perform API operations by Node authorization and NodeRestriction
admission. Previously, a Kubelet's authorization was broader. This
is the primary security motivation.

2. The bootstrap kubeconfig credential has the same sensitivity
as the previous generated TLS client-certificate kubeconfig.
It must be distributed securely to nodes. Its compromise still
allows an attacker to obtain a Kubelet kubeconfig

3. Bootstrapping Kubelet kubeconfig's with a limited lifetime offers
a slight security improvement.
  * An attacker who obtains the kubeconfig can likely obtain the
  bootstrap kubeconfig as well, to obtain the ability to renew
  their access
  * A compromised bootstrap kubeconfig could plausibly be handled
  by replacing the bootstrap token Secret, distributing the token
  to new nodes, and expiration. Whereas a compromised TLS-client
  certificate kubeconfig can't be revoked (no CRL). However,
  replacing a bootstrap token can be impractical in real cluster
  environments, so the limited lifetime is mostly a theoretical
  benefit.
  * Cluster CSR objects are visible via kubectl which is nice

4. Bootstrapping node-unique Kubelet kubeconfigs means Kubelet
clients have more identity information, which can improve the
utility of audits and future features

Rel: https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet-tls-bootstrapping/
2020-04-25 19:38:56 -07:00
Dalton Hubble
42723d13a6 Change default kube-system DaemonSet tolerations
* Change kube-proxy, flannel, and calico-node DaemonSet
tolerations to tolerate `node.kubernetes.io/not-ready`
and `node-role.kubernetes.io/master` (i.e. controllers)
explicitly, rather than tolerating all taints
* kube-system DaemonSets will no longer tolerate custom
node taints by default. Instead, custom node taints must
be enumerated to opt-in to scheduling/executing the
kube-system DaemonSets.

Background: Tolerating all taints ruled out use-cases
where certain nodes might legitimately need to keep
kube-proxy or CNI networking disabled
2020-03-25 22:43:50 -07:00
Dalton Hubble
e76f0a09fa Switch from upstream hyperkube to component images
* Kubernetes plans to stop releasing the hyperkube image in
the future.
* Upstream will continue releasing container images for
`kube-apiserver`, `kube-controller-manager`, `kube-proxy`,
and `kube-scheduler`. Typhoon will use these images
* Upstream will release the kubelet as a binary for distros
to package, either as a traditional DEB/RPP or as a container
image for container-optimized operating systems. Typhoon will
take on the packaging of Kubelet and its dependencies as a new
container image (alongside kubectl)

Rel: https://github.com/kubernetes/kubernetes/pull/88676
See: https://github.com/poseidon/kubelet
2020-03-17 22:13:42 -07:00
Dalton Hubble
804029edd5 Update Calico from v3.12.0 to v3.13.1
* https://docs.projectcalico.org/v3.13/release-notes/
2020-03-12 22:55:57 -07:00
Dalton Hubble
ac4b7af570 Configure kube-proxy to serve /metrics on 0.0.0.0:10249
* Set kube-proxy --metrics-bind-address to 0.0.0.0 (default
127.0.0.1) so Prometheus metrics can be scraped
* Add pod port list (informational only)
* Require node firewall rules to be updated before scrapes
can succeed
2019-12-29 11:56:52 -08:00
Dalton Hubble
4369c706e2 Restore kube-controller-manager settings lost in static pod migration
* Migration from a self-hosted to a static pod control plane dropped
a few kube-controller-manager customizations
* Reduce kube-controller-manager --pod-eviction-timeout from 5m to 1m
to move pods more quickly when nodes are preempted
* Fix flex-volume-plugin-dir since the Kubernetes default points to
a read-only filesystem on Container Linux / Fedora CoreOS

Related:

* https://github.com/poseidon/terraform-render-bootstrap/pull/148
* 7b06557b7a
2019-12-08 22:37:36 -08:00
Dalton Hubble
7df6bd8d1e Tune static pod CPU requests slightly lower
* Reduce kube-apiserver and kube-controller-manager CPU
requests from 200m to 150m. Prefer slightly lower commitment
after running with the requests chosen in #161 for a while
* Reduce calico-node CPU request from 150m to 100m to match
CoreDNS and flannel
2019-12-08 22:25:58 -08:00
Dalton Hubble
0f1f16c612 Add small CPU resource requests to static pods
* Set small CPU requests on static pods kube-apiserver,
kube-controller-manager, and kube-scheduler to align with
upstream tooling and for edge cases
* Control plane nodes are tainted to isolate them from
ordinary workloads. Even dense workloads can only compress
CPU resources on worker nodes.
* Control plane static pods use the highest priority class, so
contention favors control plane pods (over say node-exporter)
and CPU is compressible too.
* Effectively, a practical case for these requests hasn't been
observed. However, a small static pod CPU request may offer
a slight benefit if a controller became overloaded and the
above mechanisms were insufficient for some reason (bit of a
stretch, due to CPU compressibility)
* Continue to avoid setting a memory request for static pods.
It would impose a hard size requirement on controller nodes,
which isn't warranted and is handled more gently by Typhoon
default instance types across clouds and via docs
2019-11-13 16:44:33 -08:00
Dalton Hubble
43e1230c55 Update CoreDNS from v1.6.2 to v1.6.5
* Add health `lameduck` option 5s. Before CoreDNS shuts down,
it will wait and report unhealthy for 5s to allow time for
plugins to shutdown cleanly
* Minor bug fixes over a few releases
* https://coredns.io/2019/08/31/coredns-1.6.3-release/
* https://coredns.io/2019/09/27/coredns-1.6.4-release/
* https://coredns.io/2019/11/05/coredns-1.6.5-release/
2019-11-13 14:33:50 -08:00
Dalton Hubble
3c7334ab55 Upgrade Calico from v3.9.2 to v3.10.0
* Change calico-node livenessProve from httpGet to exec
a calico-node -felix-ready, as recommended by Calico
* Allow advertising Kubernetes service ClusterIPs
2019-10-27 01:06:09 -07:00
Dalton Hubble
e09d6bef33 Switch kube-proxy from iptables mode to ipvs mode
* Kubernetes v1.11 considered kube-proxy IPVS mode GA
* Many problems were found https://github.com/poseidon/typhoon/pull/321
* Since then, major blockers seem to have been addressed
2019-10-15 22:55:17 -07:00
Dalton Hubble
1f8b634652 Remove unneeded control plane flags
* Several flags now default to the arguments we've been
setting and are no longer needed
2019-10-06 20:25:46 -07:00