116 Commits

Author SHA1 Message Date
Dalton Hubble
4369c706e2 Restore kube-controller-manager settings lost in static pod migration
* Migration from a self-hosted to a static pod control plane dropped
a few kube-controller-manager customizations
* Reduce kube-controller-manager --pod-eviction-timeout from 5m to 1m
to move pods more quickly when nodes are preempted
* Fix flex-volume-plugin-dir since the Kubernetes default points to
a read-only filesystem on Container Linux / Fedora CoreOS

Related:

* https://github.com/poseidon/terraform-render-bootstrap/pull/148
* 7b06557b7a
2019-12-08 22:37:36 -08:00
Dalton Hubble
7df6bd8d1e Tune static pod CPU requests slightly lower
* Reduce kube-apiserver and kube-controller-manager CPU
requests from 200m to 150m. Prefer slightly lower commitment
after running with the requests chosen in #161 for a while
* Reduce calico-node CPU request from 150m to 100m to match
CoreDNS and flannel
2019-12-08 22:25:58 -08:00
Dalton Hubble
0f1f16c612 Add small CPU resource requests to static pods
* Set small CPU requests on static pods kube-apiserver,
kube-controller-manager, and kube-scheduler to align with
upstream tooling and for edge cases
* Control plane nodes are tainted to isolate them from
ordinary workloads. Even dense workloads can only compress
CPU resources on worker nodes.
* Control plane static pods use the highest priority class, so
contention favors control plane pods (over say node-exporter)
and CPU is compressible too.
* Effectively, a practical case for these requests hasn't been
observed. However, a small static pod CPU request may offer
a slight benefit if a controller became overloaded and the
above mechanisms were insufficient for some reason (bit of a
stretch, due to CPU compressibility)
* Continue to avoid setting a memory request for static pods.
It would impose a hard size requirement on controller nodes,
which isn't warranted and is handled more gently by Typhoon
default instance types across clouds and via docs
2019-11-13 16:44:33 -08:00
Dalton Hubble
43e1230c55 Update CoreDNS from v1.6.2 to v1.6.5
* Add health `lameduck` option 5s. Before CoreDNS shuts down,
it will wait and report unhealthy for 5s to allow time for
plugins to shutdown cleanly
* Minor bug fixes over a few releases
* https://coredns.io/2019/08/31/coredns-1.6.3-release/
* https://coredns.io/2019/09/27/coredns-1.6.4-release/
* https://coredns.io/2019/11/05/coredns-1.6.5-release/
2019-11-13 14:33:50 -08:00
Dalton Hubble
3c7334ab55 Upgrade Calico from v3.9.2 to v3.10.0
* Change calico-node livenessProve from httpGet to exec
a calico-node -felix-ready, as recommended by Calico
* Allow advertising Kubernetes service ClusterIPs
2019-10-27 01:06:09 -07:00
Dalton Hubble
e09d6bef33 Switch kube-proxy from iptables mode to ipvs mode
* Kubernetes v1.11 considered kube-proxy IPVS mode GA
* Many problems were found https://github.com/poseidon/typhoon/pull/321
* Since then, major blockers seem to have been addressed
2019-10-15 22:55:17 -07:00
Dalton Hubble
1f8b634652 Remove unneeded control plane flags
* Several flags now default to the arguments we've been
setting and are no longer needed
2019-10-06 20:25:46 -07:00
Dalton Hubble
18b7a74d30 Update Calico from v3.8.2 to v3.9.1
* https://docs.projectcalico.org/v3.9/release-notes/
2019-09-29 11:14:20 -07:00
Dalton Hubble
6e59af7113 Migrate from a self-hosted to static pod control plane
* Run kube-apiserver, kube-scheduler, and kube-controller-manager
as static pods on each controller node
* Boostrap a minimal control plane by copying `static-manifests`
to the Kubelet `--pod-manifest-path` and tls/auth secrets to
`/etc/kubernetes/bootstrap-secrets`. Then, kubectl apply Kubernetes
manifests.
* Discontinue using bootkube to bootstrap and pivot to a self-hosted
control plane.
* Remove bootkube self-hosted kube-apiserver DaemonSet and
kube-scheduler and kube-controller-manager Deployments
* Remove pod-checkpointer manifests (no longer needed)

Advantages:

* Reduce control plane bootstrapping complexity. Self-hosted pivot and
pod checkpointing worked well, but in-place edits to kube-apiserver,
kube-controller-manager, or kube-scheduler is infrequently used. The
concept was originally geared toward continuously in-place upgrading
clusters, a goal Typhoon doesn't take on (rec. blue/green clusters).
As such, the value-add isn't justifying the extra components for this
particular project.
* Static pods still provide kubectl visibility and log access

Drawbacks:

* In-place edits to kube-apiserver, kube-controller-manager, and
kube-scheduler are not possible via kubectl (non-goal)
* Assets must be copied to each controller (not just one)
* Static pod must load credentials via hostPath, which is less clean
compared with the former Kubernetes secrets and service accounts
2019-09-02 20:52:46 -07:00
Dalton Hubble
119cb00fa7 Upgrade Calico from v3.7.4 to v3.8.0
* Enable CNI bandwidth plugin for traffic shaping
* https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/#support-traffic-shaping
2019-07-11 21:00:58 -07:00
Dalton Hubble
4caca47776 Run kube-apiserver as non-root user (nobody) 2019-07-06 13:51:54 -07:00
Dalton Hubble
3bfd1253ec Always run kube-apiserver on port 6443 (internally)
* Require bootstrap-kube-apiserver and kube-apiserver components
listen on port 6443 (internally) to allow kube-apiserver pods to
run with lower user privilege
* Remove variable `apiserver_port`. The kube-apiserver listen
port is no longer customizable.
* Add variable `external_apiserver_port` to allow architectures
where a load balancer fronts kube-apiserver 6443 backends, but
listens on a different port externally. For example, Google Cloud
TCP Proxy load balancers cannot listen on 6443
2019-07-06 13:50:22 -07:00
Dalton Hubble
62df9ad69c Update Kubernetes from v1.14.3 to v1.15.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.15.md#v1150
2019-06-23 13:04:13 -07:00
Dalton Hubble
efd1cfd9bf Update CoreDNS from v1.3.1 to v1.5.0
* Add `ready` plugin and change the readinessProbe to check
default port 8181 to ensure all plugins are ready
* `upstream [ADDRESS]` defines upstream resolvers for external
services. If no address is given, resolution is against CoreDNS
itself, which is the default. So `upstream` can be removed
2019-05-27 00:07:59 -07:00
Dalton Hubble
fc7a6fb20a Change flannel port from 8472 to 4789
* Change flannel port from the kernel default 8472 to the
IANA assigned VXLAN port 4789
* Requires a change to firewall rules or security groups
depending on the platform (**action required!**)
* Why now? Calico now offers its own VXLAN backend so
standardizing on the IANA port simplifies configuration
* https://github.com/coreos/flannel/blob/master/Documentation/backends.md#vxlan
2019-05-06 21:23:08 -07:00
Dalton Hubble
b96d641f6d Update Calico from v3.6.1 to v3.7.0
* Accept a `network_encapsulation` variable to choose whether the
default IPPool should use ipip (default) or vxlan encapsulation
* Use `network_mtu` as the MTU for workload interfaces for ipip
or vxlan (although Calico can have a IPPools with a mix, we're
picking ipip xor vxlan)
2019-05-05 20:41:53 -07:00
Dalton Hubble
b9bef14a0b Add enable_aggregation option (defaults to false)
* Add an `enable_aggregation` variable to enable the kube-apiserver
aggregation layer for adding extension apiservers to clusters
* Aggregation is **disabled** by default. Typhoon recommends you not
enable aggregation. Consider whether less invasive ways to achieve
your goals are possible and whether those goals are well-founded
* Enabling aggregation and extension apiservers increases the attack
surface of a cluster and makes extensions a part of the control plane.
Admins must scrutinize and trust any extension apiserver used.
* Passing a v1.14 CNCF conformance test requires aggregation be enabled.
Having an option for aggregation keeps compliance, but retains the stricter
security posture on default clusters
2019-04-07 02:27:40 -07:00
Dalton Hubble
9862888bb2 Reduce calico-node CPU request from 250m to 150m
* calico-node uses only a small fraction of its CPU request
(i.e. reservation) even under stress. The unbounded limit
already allows usage to scale favorably in bursty cases
* Motivation: On instance types that skew memory-optimized
(e.g. GCP n1), over-requesting can push the system toward
overcommitment (alerts can be tuned)
* Overcommitment is not necessarily bad, but 250m seems too
generous a minimum given the actual usage
2019-03-24 11:55:56 -07:00
Dalton Hubble
23f81a5e8c Upgrade Calico from v3.5.2 to v3.6.0
* Add calico-ipam CRDs and RBAC permissions
* Switch IPAM from host-local to calico-ipam!
  * `calico-ipam` subnets `ippools` (defaults to pod CIDR) into
`ipamblocks` (defaults to /26, but set to /24 in Typhoon)
  * `host-local` subnets the pod CIDR based on the node PodCIDR
field (set via kube-controller-manager as /24's)
* Create a custom default IPv4 IPPool to ensure the block size
is kept at /24 to allow 110 pods per node (Kubernetes default)
* Retaining host-local was slightly preferred, but Calico v3.6
is migrating all usage to calico-ipam. The codepath that skipped
calico-ipam for KDD was removed
*  https://docs.projectcalico.org/v3.6/release-notes/
2019-03-18 22:28:48 -07:00
Dalton Hubble
6cda319b9d Revert "Update Calico from v3.5.2 to v3.6.0"
* Calico is not using host-local IPAM as desired
* This reverts commit e6e051ef47.
2019-03-18 21:32:23 -07:00
Dalton Hubble
e6e051ef47 Update Calico from v3.5.2 to v3.6.0
* Add calico-ipam CRDs and RBAC permissions
* Continue using host-local IPAM
*  https://docs.projectcalico.org/v3.6/release-notes/
2019-03-18 21:03:27 -07:00
Dalton Hubble
1528266595 Resolve in-addr.arpa and ip6.arpa zones with CoreDNS kubernetes plugin
* Resolve in-addr.arpa and ip6.arpa DNS PTR requests for Kubernetes
service IPs and pod IPs
* Previously, CoreDNS was configured to resolve in-addr.arpa PTR
records for service IPs (but not pod IPs)
2019-03-04 22:33:21 -08:00
Dalton Hubble
593f0e3655 Add a readinessProbe to CoreDNS
* https://github.com/kubernetes/kubernetes/pull/74137
2019-02-23 13:11:19 -08:00
Dalton Hubble
c5f5aacce9 Assign Pod Priority Classes to control plane components
* Priority Admission Controller has been enabled since Typhoon
v1.11.1
* Assign cluster and node components a builtin priorityClassName
(higher is higher priority) to inform scheduler prepemption,
scheduling order, and node out-of-resource eviction order
2019-02-17 17:12:46 -08:00
Dalton Hubble
7dc8f8bf8c Switch CoreDNS to use the forward plugin instead of proxy
* Use the forward plugin to forward to upstream resolvers, instead
of the proxy plugin. The forward plugin is reported to be a faster
alternative since it can re-use open sockets
* https://coredns.io/explugins/forward/
* https://coredns.io/plugins/proxy/
* https://github.com/kubernetes/kubernetes/issues/73254
2019-01-30 22:19:13 -08:00
Dalton Hubble
7b06557b7a Reduce kube-controller-manager --pod-eviction-timeout to 1m
* Pods on preempted nodes should be moved to healthy nodes
more quickly (1 min instead of 5 minutes)
2019-01-27 16:20:01 -08:00
Dalton Hubble
e892e291b5 Restore Kubelet authorization to delete nodes
* Fix a regression caused by lowering the Kubelet TLS client
certificate to system:nodes group (#100) since dropping
cluster-admin dropped the Kubelet's ability to delete nodes.
* On clouds where workers can scale down (manual terraform apply,
AWS spot termination, Azure low priority deletion), worker shutdown
runs the delete-node.service to remove a node to prevent NotReady
nodes from accumulating
* Allow Kubelets to delete cluster nodes via system:nodes group. Kubelets
acting with system:node and kubelet-delete ClusterRoles is still an
improvement over acting as cluster-admin
2019-01-14 23:26:41 -08:00
Dalton Hubble
f1e69f1d93 Re-enable kube-scheduler and kube-controller-manager HTTP ports
* Fix regression added in 48730c0f12, allow Prometheus to scrape
metrics from kube-scheduler and kube-controller-manager
2019-01-11 23:52:57 -08:00
Dalton Hubble
48730c0f12 Probe kube-scheduler and kube-controller-manager HTTPS ports
* Disable kube-scheduler and kube-controller-manager HTTP ports
2019-01-09 20:50:57 -08:00
Dalton Hubble
0e65e3567e Enable certificates.k8s.io API certificate issuance
* Allow kube-controller-manager to sign Approved CSR's using the
cluster CA private key to issue cluster certificates
* System components that need to use certificates signed by the
cluster CA can submit a CSR to the apiserver, have an admin
inspect and manually approve it, and be issued a certificate
* Admins should inspect CSRs very carefully to ensure their
origin and authorization level are appropriate
* https://kubernetes.io/docs/tasks/tls/managing-tls-in-a-cluster/#approving-certificate-signing-requests
2019-01-06 17:17:03 -08:00
Dalton Hubble
4f8952a956 Disable anonymous auth on the bootstrap kube-apiserver
* Anonymous auth isn't used during bootstrapping and can
be disabled
2019-01-05 21:48:40 -08:00
Dalton Hubble
ea30087577 Structure control plane manifests neatly 2019-01-05 21:47:30 -08:00
Dalton Hubble
847ec5929b Consolidate both variants of the admin kubeconfig
* Provide an admin kubeconfig which includes a named context
and also sets that context as the current-context
* Retains support for both the KUBECONFIG=path style of usage
or adding many kubeconfig's to a ~/.kube/configs folder and
using `kubectl use-context CLUSTER-context`
2019-01-05 14:56:45 -08:00
Dalton Hubble
f5ea389e8c Update CoreDNS from v1.2.6 to v1.3.0
* https://coredns.io/2018/12/15/coredns-1.3.0-release/
* Limit log plugin to just log error class
2019-01-05 13:21:10 -08:00
Dalton Hubble
a7bd306679 Add admin kubeconfig and limit Kubelet cert to system:nodes group
* Change Kubelet TLS client certificate to belong to the system:nodes
group instead of the system:masters group (more limited)
* Bind the system:node ClusterRole to the system:nodes group (yes,
the ClusterRole is singular)
* Generate separate admin.crt and admin.key files (which do still use
system:masters). Output kubeconfig-kubelet and kubeconfig-admin values
from the module
* Remove the kubeconfig output to force users to pick the correct
kubeconfig, depending on how the output is used (action required!)

Related:

* https://kubernetes.io/docs/reference/access-authn-authz/rbac/#core-component-roles

Note, NodeAuthorizer/NodeRestriction would be an enhancement, but to
work across platforms it effectively requires TLS bootstraping which
doesn't have a viable attestation strategy and clashes with CCM. This
change improves Kubelet limitations, but intentionally doesn't aim to
steer toward NodeAuthorizer/NodeRestriction
2019-01-02 23:08:09 -08:00
Dalton Hubble
7bcca25043 Use a kube-apiserver ServiceAccount and ClusterRoleBinding
* Switch kube-apiserver from using the kube-system default ServicAccount
(with cluster-admin) to using a kube-apiserver ServiceAccount bound to
cluster-admin (as before)
* Remove the default-sa ClusterRoleBinding that allowed kube-apiserver
and kube-scheduler (or other 3rd-party components added to kube-system)
to use the kube-system default ServiceAccount for cluster-admin
* Require all future components in kube-system define their own
ServiceAccount
2019-01-01 17:30:28 -08:00
Dalton Hubble
fa4c2d8a68 Use a kube-scheduler ServiceAccount and ClusterRoleBinding
* Switch kube-scheduler from using the kube-system default ServiceAccount
(with cluster-admin) to using a kube-scheduler ServiceAccount bound to
the builtin system:kube-scheduler and system:volume-scheduler
(required for StorageClass) ClusterRoles
* https://kubernetes.io/docs/reference/access-authn-authz/rbac/#core-component-roles
2019-01-01 17:29:36 -08:00
Dalton Hubble
d14348a368 Update Calico from v3.3.2 to v3.4.0
* Use an init container to install CNI plugins
* Update the calico-node ClusterRole
2018-12-15 18:04:25 -08:00
Dalton Hubble
b101fddf6e Configure kube-router to use in-cluster-kubeconfig
* Use access token, but access apiserver via apiserver endpoint
rather than internal service IP
2018-12-06 22:39:59 -08:00
Dalton Hubble
cff13f9248 Update hyperkube from v1.12.3 to v1.13.0
* Remove controller-manager empty dir mount added for v1.12
https://github.com/kubernetes/kubernetes/issues/68973
* No longer required https://github.com/kubernetes/kubernetes/pull/69884
2018-12-03 20:42:14 -08:00
Dalton Hubble
9d6f0c31d3 Add experimental kube-router CNI provider
* Allow using kube-router for pod-to-pod networking
and for NetworkPolicy
2018-12-03 19:42:02 -08:00
Dalton Hubble
bffb5d5d23 Update pod-checkpointer image to query Kubelet secure api
* Updates pod-checkpointer to prefer the Kubelet secure
API (before falling back to the Kubelet read-only API that
is disabled on Typhoon clusters since
https://github.com/poseidon/typhoon/pull/324)
* Previously, pod-checkpointer checkpointed an initial set
of pods during bootstrapping so recovery from power cycling
clusters was unaffected, but logs were noisy
* https://github.com/kubernetes-incubator/bootkube/pull/1027
* https://github.com/kubernetes-incubator/bootkube/pull/1025
2018-11-26 20:11:01 -08:00
Dalton Hubble
dbf67da1cb Disable Calico usage reporting by default
* Calico Felix has been reporting anonymous usage data about
Calico version and cluster size
* https://docs.projectcalico.org/v3.3/reference/felix/configuration
* Add an enable_reporting variable and default to false
2018-11-18 23:41:19 -08:00
Dalton Hubble
39f9afb336 Add resource request to flannel and mount /run/flannel
* Request 100m CPU without a limit (similar to Calico)
2018-11-11 15:56:13 -08:00
Dalton Hubble
3f3ab6b5c0 Enable CoreDNS loop and loadbalance plugins
* loop sends an initial query to detect infinite forwarding
loops in configured upstream DNS servers and fast exit with
an error (its a fatal misconfiguration on the network that
will otherwise cause resolvers to consume memory/CPU until
crashing, masking the problem)
* https://github.com/coredns/coredns/tree/master/plugin/loop
* loadbalance randomizes the ordering of A, AAAA, and MX records
in responses to provide round-robin load balancing (as usual,
clients may still cache responses though)
* https://github.com/coredns/coredns/tree/master/plugin/loadbalance
2018-11-10 17:33:30 -08:00
Dalton Hubble
d045a8e6b8 Structure flannel/Calico manifests consistently
* Organize flannel and Calico manifests to use consistent
naming, structure, and ordering to align
* Downside: Makes direct diff'ing with upstream harder, but
that's become difficult lately anyway, since Calico uses a
templating engine
2018-11-10 13:14:36 -08:00
Dalton Hubble
365d089610 Set kube-apiserver's kubelet preferred address types
* Prefer InternalIP and ExternalIP over the node's hostname,
to match upstream behavior and kubeadm
* Previously, hostname-override was used to set node names
to internal IP's to work around some cloud providers not
resolving hostnames for instances (e.g. DO droplets)
2018-11-03 14:58:30 -07:00
Dalton Hubble
6a77775e52 Update CoreDNS from v1.2.2 to v1.2.4
* https://coredns.io/2018/10/17/coredns-1.2.4-release/
* https://coredns.io/2018/10/16/coredns-1.2.3-release/
2018-10-27 15:35:21 -07:00
Dalton Hubble
e0e5577d37 Update Calico from v3.2.3 to v3.3.0
* https://docs.projectcalico.org/v3.3/releases/
2018-10-23 20:26:48 -07:00
Dalton Hubble
79065baa8c Fix CoreDNS AntiAffinity to prefer spreading pods 2018-10-17 22:15:53 -07:00