terraform-render-bootstrap

github-personal/terraform-render-bootstrap

mirror of https://github.com/outbackdingo/terraform-render-bootstrap.git synced 2026-01-27 18:20:40 +00:00

Author	SHA1	Message	Date
Dalton Hubble	4369c706e2	Restore kube-controller-manager settings lost in static pod migration * Migration from a self-hosted to a static pod control plane dropped a few kube-controller-manager customizations * Reduce kube-controller-manager --pod-eviction-timeout from 5m to 1m to move pods more quickly when nodes are preempted * Fix flex-volume-plugin-dir since the Kubernetes default points to a read-only filesystem on Container Linux / Fedora CoreOS Related: * https://github.com/poseidon/terraform-render-bootstrap/pull/148 * `7b06557b7a`	2019-12-08 22:37:36 -08:00
Dalton Hubble	7df6bd8d1e	Tune static pod CPU requests slightly lower * Reduce kube-apiserver and kube-controller-manager CPU requests from 200m to 150m. Prefer slightly lower commitment after running with the requests chosen in #161 for a while * Reduce calico-node CPU request from 150m to 100m to match CoreDNS and flannel	2019-12-08 22:25:58 -08:00
Dalton Hubble	0f1f16c612	Add small CPU resource requests to static pods * Set small CPU requests on static pods kube-apiserver, kube-controller-manager, and kube-scheduler to align with upstream tooling and for edge cases * Control plane nodes are tainted to isolate them from ordinary workloads. Even dense workloads can only compress CPU resources on worker nodes. * Control plane static pods use the highest priority class, so contention favors control plane pods (over say node-exporter) and CPU is compressible too. * Effectively, a practical case for these requests hasn't been observed. However, a small static pod CPU request may offer a slight benefit if a controller became overloaded and the above mechanisms were insufficient for some reason (bit of a stretch, due to CPU compressibility) * Continue to avoid setting a memory request for static pods. It would impose a hard size requirement on controller nodes, which isn't warranted and is handled more gently by Typhoon default instance types across clouds and via docs	2019-11-13 16:44:33 -08:00
Dalton Hubble	43e1230c55	Update CoreDNS from v1.6.2 to v1.6.5 * Add health `lameduck` option 5s. Before CoreDNS shuts down, it will wait and report unhealthy for 5s to allow time for plugins to shutdown cleanly * Minor bug fixes over a few releases * https://coredns.io/2019/08/31/coredns-1.6.3-release/ * https://coredns.io/2019/09/27/coredns-1.6.4-release/ * https://coredns.io/2019/11/05/coredns-1.6.5-release/	2019-11-13 14:33:50 -08:00
Dalton Hubble	3c7334ab55	Upgrade Calico from v3.9.2 to v3.10.0 * Change calico-node livenessProve from httpGet to exec a calico-node -felix-ready, as recommended by Calico * Allow advertising Kubernetes service ClusterIPs	2019-10-27 01:06:09 -07:00
Dalton Hubble	e09d6bef33	Switch kube-proxy from iptables mode to ipvs mode * Kubernetes v1.11 considered kube-proxy IPVS mode GA * Many problems were found https://github.com/poseidon/typhoon/pull/321 * Since then, major blockers seem to have been addressed	2019-10-15 22:55:17 -07:00
Dalton Hubble	1f8b634652	Remove unneeded control plane flags * Several flags now default to the arguments we've been setting and are no longer needed	2019-10-06 20:25:46 -07:00
Dalton Hubble	18b7a74d30	Update Calico from v3.8.2 to v3.9.1 * https://docs.projectcalico.org/v3.9/release-notes/	2019-09-29 11:14:20 -07:00
Dalton Hubble	6e59af7113	Migrate from a self-hosted to static pod control plane * Run kube-apiserver, kube-scheduler, and kube-controller-manager as static pods on each controller node * Boostrap a minimal control plane by copying `static-manifests` to the Kubelet `--pod-manifest-path` and tls/auth secrets to `/etc/kubernetes/bootstrap-secrets`. Then, kubectl apply Kubernetes manifests. * Discontinue using bootkube to bootstrap and pivot to a self-hosted control plane. * Remove bootkube self-hosted kube-apiserver DaemonSet and kube-scheduler and kube-controller-manager Deployments * Remove pod-checkpointer manifests (no longer needed) Advantages: * Reduce control plane bootstrapping complexity. Self-hosted pivot and pod checkpointing worked well, but in-place edits to kube-apiserver, kube-controller-manager, or kube-scheduler is infrequently used. The concept was originally geared toward continuously in-place upgrading clusters, a goal Typhoon doesn't take on (rec. blue/green clusters). As such, the value-add isn't justifying the extra components for this particular project. * Static pods still provide kubectl visibility and log access Drawbacks: * In-place edits to kube-apiserver, kube-controller-manager, and kube-scheduler are not possible via kubectl (non-goal) * Assets must be copied to each controller (not just one) * Static pod must load credentials via hostPath, which is less clean compared with the former Kubernetes secrets and service accounts	2019-09-02 20:52:46 -07:00
Dalton Hubble	119cb00fa7	Upgrade Calico from v3.7.4 to v3.8.0 * Enable CNI bandwidth plugin for traffic shaping * https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/#support-traffic-shaping	2019-07-11 21:00:58 -07:00
Dalton Hubble	4caca47776	Run kube-apiserver as non-root user (nobody)	2019-07-06 13:51:54 -07:00
Dalton Hubble	3bfd1253ec	Always run kube-apiserver on port 6443 (internally) * Require bootstrap-kube-apiserver and kube-apiserver components listen on port 6443 (internally) to allow kube-apiserver pods to run with lower user privilege * Remove variable `apiserver_port`. The kube-apiserver listen port is no longer customizable. * Add variable `external_apiserver_port` to allow architectures where a load balancer fronts kube-apiserver 6443 backends, but listens on a different port externally. For example, Google Cloud TCP Proxy load balancers cannot listen on 6443	2019-07-06 13:50:22 -07:00
Dalton Hubble	62df9ad69c	Update Kubernetes from v1.14.3 to v1.15.0 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.15.md#v1150	2019-06-23 13:04:13 -07:00
Dalton Hubble	efd1cfd9bf	Update CoreDNS from v1.3.1 to v1.5.0 * Add `ready` plugin and change the readinessProbe to check default port 8181 to ensure all plugins are ready * `upstream [ADDRESS]` defines upstream resolvers for external services. If no address is given, resolution is against CoreDNS itself, which is the default. So `upstream` can be removed	2019-05-27 00:07:59 -07:00
Dalton Hubble	fc7a6fb20a	Change flannel port from 8472 to 4789 * Change flannel port from the kernel default 8472 to the IANA assigned VXLAN port 4789 * Requires a change to firewall rules or security groups depending on the platform (action required!) * Why now? Calico now offers its own VXLAN backend so standardizing on the IANA port simplifies configuration * https://github.com/coreos/flannel/blob/master/Documentation/backends.md#vxlan	2019-05-06 21:23:08 -07:00
Dalton Hubble	b96d641f6d	Update Calico from v3.6.1 to v3.7.0 * Accept a `network_encapsulation` variable to choose whether the default IPPool should use ipip (default) or vxlan encapsulation * Use `network_mtu` as the MTU for workload interfaces for ipip or vxlan (although Calico can have a IPPools with a mix, we're picking ipip xor vxlan)	2019-05-05 20:41:53 -07:00
Dalton Hubble	b9bef14a0b	Add enable_aggregation option (defaults to false) * Add an `enable_aggregation` variable to enable the kube-apiserver aggregation layer for adding extension apiservers to clusters * Aggregation is disabled by default. Typhoon recommends you not enable aggregation. Consider whether less invasive ways to achieve your goals are possible and whether those goals are well-founded * Enabling aggregation and extension apiservers increases the attack surface of a cluster and makes extensions a part of the control plane. Admins must scrutinize and trust any extension apiserver used. * Passing a v1.14 CNCF conformance test requires aggregation be enabled. Having an option for aggregation keeps compliance, but retains the stricter security posture on default clusters	2019-04-07 02:27:40 -07:00
Dalton Hubble	9862888bb2	Reduce calico-node CPU request from 250m to 150m * calico-node uses only a small fraction of its CPU request (i.e. reservation) even under stress. The unbounded limit already allows usage to scale favorably in bursty cases * Motivation: On instance types that skew memory-optimized (e.g. GCP n1), over-requesting can push the system toward overcommitment (alerts can be tuned) * Overcommitment is not necessarily bad, but 250m seems too generous a minimum given the actual usage	2019-03-24 11:55:56 -07:00
Dalton Hubble	23f81a5e8c	Upgrade Calico from v3.5.2 to v3.6.0 * Add calico-ipam CRDs and RBAC permissions * Switch IPAM from host-local to calico-ipam! * `calico-ipam` subnets `ippools` (defaults to pod CIDR) into `ipamblocks` (defaults to /26, but set to /24 in Typhoon) * `host-local` subnets the pod CIDR based on the node PodCIDR field (set via kube-controller-manager as /24's) * Create a custom default IPv4 IPPool to ensure the block size is kept at /24 to allow 110 pods per node (Kubernetes default) * Retaining host-local was slightly preferred, but Calico v3.6 is migrating all usage to calico-ipam. The codepath that skipped calico-ipam for KDD was removed * https://docs.projectcalico.org/v3.6/release-notes/	2019-03-18 22:28:48 -07:00
Dalton Hubble	6cda319b9d	Revert "Update Calico from v3.5.2 to v3.6.0" * Calico is not using host-local IPAM as desired * This reverts commit `e6e051ef47`.	2019-03-18 21:32:23 -07:00
Dalton Hubble	e6e051ef47	Update Calico from v3.5.2 to v3.6.0 * Add calico-ipam CRDs and RBAC permissions * Continue using host-local IPAM * https://docs.projectcalico.org/v3.6/release-notes/	2019-03-18 21:03:27 -07:00
Dalton Hubble	1528266595	Resolve in-addr.arpa and ip6.arpa zones with CoreDNS kubernetes plugin * Resolve in-addr.arpa and ip6.arpa DNS PTR requests for Kubernetes service IPs and pod IPs * Previously, CoreDNS was configured to resolve in-addr.arpa PTR records for service IPs (but not pod IPs)	2019-03-04 22:33:21 -08:00
Dalton Hubble	593f0e3655	Add a readinessProbe to CoreDNS * https://github.com/kubernetes/kubernetes/pull/74137	2019-02-23 13:11:19 -08:00
Dalton Hubble	c5f5aacce9	Assign Pod Priority Classes to control plane components * Priority Admission Controller has been enabled since Typhoon v1.11.1 * Assign cluster and node components a builtin priorityClassName (higher is higher priority) to inform scheduler prepemption, scheduling order, and node out-of-resource eviction order	2019-02-17 17:12:46 -08:00
Dalton Hubble	7dc8f8bf8c	Switch CoreDNS to use the forward plugin instead of proxy * Use the forward plugin to forward to upstream resolvers, instead of the proxy plugin. The forward plugin is reported to be a faster alternative since it can re-use open sockets * https://coredns.io/explugins/forward/ * https://coredns.io/plugins/proxy/ * https://github.com/kubernetes/kubernetes/issues/73254	2019-01-30 22:19:13 -08:00
Dalton Hubble	7b06557b7a	Reduce kube-controller-manager --pod-eviction-timeout to 1m * Pods on preempted nodes should be moved to healthy nodes more quickly (1 min instead of 5 minutes)	2019-01-27 16:20:01 -08:00
Dalton Hubble	e892e291b5	Restore Kubelet authorization to delete nodes * Fix a regression caused by lowering the Kubelet TLS client certificate to system:nodes group (#100) since dropping cluster-admin dropped the Kubelet's ability to delete nodes. * On clouds where workers can scale down (manual terraform apply, AWS spot termination, Azure low priority deletion), worker shutdown runs the delete-node.service to remove a node to prevent NotReady nodes from accumulating * Allow Kubelets to delete cluster nodes via system:nodes group. Kubelets acting with system:node and kubelet-delete ClusterRoles is still an improvement over acting as cluster-admin	2019-01-14 23:26:41 -08:00
Dalton Hubble	f1e69f1d93	Re-enable kube-scheduler and kube-controller-manager HTTP ports * Fix regression added in `48730c0f12`, allow Prometheus to scrape metrics from kube-scheduler and kube-controller-manager	2019-01-11 23:52:57 -08:00
Dalton Hubble	48730c0f12	Probe kube-scheduler and kube-controller-manager HTTPS ports * Disable kube-scheduler and kube-controller-manager HTTP ports	2019-01-09 20:50:57 -08:00
Dalton Hubble	0e65e3567e	Enable certificates.k8s.io API certificate issuance * Allow kube-controller-manager to sign Approved CSR's using the cluster CA private key to issue cluster certificates * System components that need to use certificates signed by the cluster CA can submit a CSR to the apiserver, have an admin inspect and manually approve it, and be issued a certificate * Admins should inspect CSRs very carefully to ensure their origin and authorization level are appropriate * https://kubernetes.io/docs/tasks/tls/managing-tls-in-a-cluster/#approving-certificate-signing-requests	2019-01-06 17:17:03 -08:00
Dalton Hubble	4f8952a956	Disable anonymous auth on the bootstrap kube-apiserver * Anonymous auth isn't used during bootstrapping and can be disabled	2019-01-05 21:48:40 -08:00
Dalton Hubble	ea30087577	Structure control plane manifests neatly	2019-01-05 21:47:30 -08:00
Dalton Hubble	847ec5929b	Consolidate both variants of the admin kubeconfig * Provide an admin kubeconfig which includes a named context and also sets that context as the current-context * Retains support for both the KUBECONFIG=path style of usage or adding many kubeconfig's to a ~/.kube/configs folder and using `kubectl use-context CLUSTER-context`	2019-01-05 14:56:45 -08:00
Dalton Hubble	f5ea389e8c	Update CoreDNS from v1.2.6 to v1.3.0 * https://coredns.io/2018/12/15/coredns-1.3.0-release/ * Limit log plugin to just log error class	2019-01-05 13:21:10 -08:00
Dalton Hubble	a7bd306679	Add admin kubeconfig and limit Kubelet cert to system:nodes group * Change Kubelet TLS client certificate to belong to the system:nodes group instead of the system:masters group (more limited) * Bind the system:node ClusterRole to the system:nodes group (yes, the ClusterRole is singular) * Generate separate admin.crt and admin.key files (which do still use system:masters). Output kubeconfig-kubelet and kubeconfig-admin values from the module * Remove the kubeconfig output to force users to pick the correct kubeconfig, depending on how the output is used (action required!) Related: * https://kubernetes.io/docs/reference/access-authn-authz/rbac/#core-component-roles Note, NodeAuthorizer/NodeRestriction would be an enhancement, but to work across platforms it effectively requires TLS bootstraping which doesn't have a viable attestation strategy and clashes with CCM. This change improves Kubelet limitations, but intentionally doesn't aim to steer toward NodeAuthorizer/NodeRestriction	2019-01-02 23:08:09 -08:00
Dalton Hubble	7bcca25043	Use a kube-apiserver ServiceAccount and ClusterRoleBinding * Switch kube-apiserver from using the kube-system default ServicAccount (with cluster-admin) to using a kube-apiserver ServiceAccount bound to cluster-admin (as before) * Remove the default-sa ClusterRoleBinding that allowed kube-apiserver and kube-scheduler (or other 3rd-party components added to kube-system) to use the kube-system default ServiceAccount for cluster-admin * Require all future components in kube-system define their own ServiceAccount	2019-01-01 17:30:28 -08:00
Dalton Hubble	fa4c2d8a68	Use a kube-scheduler ServiceAccount and ClusterRoleBinding * Switch kube-scheduler from using the kube-system default ServiceAccount (with cluster-admin) to using a kube-scheduler ServiceAccount bound to the builtin system:kube-scheduler and system:volume-scheduler (required for StorageClass) ClusterRoles * https://kubernetes.io/docs/reference/access-authn-authz/rbac/#core-component-roles	2019-01-01 17:29:36 -08:00
Dalton Hubble	d14348a368	Update Calico from v3.3.2 to v3.4.0 * Use an init container to install CNI plugins * Update the calico-node ClusterRole	2018-12-15 18:04:25 -08:00
Dalton Hubble	b101fddf6e	Configure kube-router to use in-cluster-kubeconfig * Use access token, but access apiserver via apiserver endpoint rather than internal service IP	2018-12-06 22:39:59 -08:00
Dalton Hubble	cff13f9248	Update hyperkube from v1.12.3 to v1.13.0 * Remove controller-manager empty dir mount added for v1.12 https://github.com/kubernetes/kubernetes/issues/68973 * No longer required https://github.com/kubernetes/kubernetes/pull/69884	2018-12-03 20:42:14 -08:00
Dalton Hubble	9d6f0c31d3	Add experimental kube-router CNI provider * Allow using kube-router for pod-to-pod networking and for NetworkPolicy	2018-12-03 19:42:02 -08:00
Dalton Hubble	bffb5d5d23	Update pod-checkpointer image to query Kubelet secure api * Updates pod-checkpointer to prefer the Kubelet secure API (before falling back to the Kubelet read-only API that is disabled on Typhoon clusters since https://github.com/poseidon/typhoon/pull/324) * Previously, pod-checkpointer checkpointed an initial set of pods during bootstrapping so recovery from power cycling clusters was unaffected, but logs were noisy * https://github.com/kubernetes-incubator/bootkube/pull/1027 * https://github.com/kubernetes-incubator/bootkube/pull/1025	2018-11-26 20:11:01 -08:00
Dalton Hubble	dbf67da1cb	Disable Calico usage reporting by default * Calico Felix has been reporting anonymous usage data about Calico version and cluster size * https://docs.projectcalico.org/v3.3/reference/felix/configuration * Add an enable_reporting variable and default to false	2018-11-18 23:41:19 -08:00
Dalton Hubble	39f9afb336	Add resource request to flannel and mount /run/flannel * Request 100m CPU without a limit (similar to Calico)	2018-11-11 15:56:13 -08:00
Dalton Hubble	3f3ab6b5c0	Enable CoreDNS loop and loadbalance plugins * loop sends an initial query to detect infinite forwarding loops in configured upstream DNS servers and fast exit with an error (its a fatal misconfiguration on the network that will otherwise cause resolvers to consume memory/CPU until crashing, masking the problem) * https://github.com/coredns/coredns/tree/master/plugin/loop * loadbalance randomizes the ordering of A, AAAA, and MX records in responses to provide round-robin load balancing (as usual, clients may still cache responses though) * https://github.com/coredns/coredns/tree/master/plugin/loadbalance	2018-11-10 17:33:30 -08:00
Dalton Hubble	d045a8e6b8	Structure flannel/Calico manifests consistently * Organize flannel and Calico manifests to use consistent naming, structure, and ordering to align * Downside: Makes direct diff'ing with upstream harder, but that's become difficult lately anyway, since Calico uses a templating engine	2018-11-10 13:14:36 -08:00
Dalton Hubble	365d089610	Set kube-apiserver's kubelet preferred address types * Prefer InternalIP and ExternalIP over the node's hostname, to match upstream behavior and kubeadm * Previously, hostname-override was used to set node names to internal IP's to work around some cloud providers not resolving hostnames for instances (e.g. DO droplets)	2018-11-03 14:58:30 -07:00
Dalton Hubble	6a77775e52	Update CoreDNS from v1.2.2 to v1.2.4 * https://coredns.io/2018/10/17/coredns-1.2.4-release/ * https://coredns.io/2018/10/16/coredns-1.2.3-release/	2018-10-27 15:35:21 -07:00
Dalton Hubble	e0e5577d37	Update Calico from v3.2.3 to v3.3.0 * https://docs.projectcalico.org/v3.3/releases/	2018-10-23 20:26:48 -07:00
Dalton Hubble	79065baa8c	Fix CoreDNS AntiAffinity to prefer spreading pods	2018-10-17 22:15:53 -07:00

1 2 3

116 Commits