Commit Graph

66 Commits

Author SHA1 Message Date
Spencer Smith
1cbbf9cd5a feat: update talos base packages
This PR will update the base packages to the latest versions. Updated
packages are:

- ca-certificates
- cni
- iptables
- kernel
- kmod
- libseccomp
- musl
- runc
- socat
- util-linux
- xfsprogs

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-03-17 19:08:13 -04:00
Spencer Smith
853ce16df4 feat: respect panic kernel flag
This PR allows Talos to respect the panic=0 flag if users pass that in
their kernel args. Doing this makes it easier to catch kernel panics in
debug scenarios and allows the user to manually trigger a restart with
ctrl+alt+del when they're ready.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-03-10 13:21:34 -04:00
Spencer Smith
b1e4b3891f chore: cleanup assets dir after bootkube is done
This PR will clean up bootkube assets regardless of whether bootkube
succeeds. This will allow for a failed bootkube deployment to retry on
reboot.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-03-06 14:25:44 -05:00
Spencer Smith
12bfd8dd94 feat: allow for persistence of config data
This PR will allow users to set the `persist: true` value in their
config data to tell talos not to re-pull the config data at each reboot.
The default will still remain as a "pull every time" methodolgy in order
to encourage immutability by default.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-03-06 11:42:00 -05:00
Andrey Smirnov
a068acfbe4 feat: split routerd from apid
New service `routerd` performs exactly single task: based on incoming
API call service name, it routes the requests to the appropriate Talos
service (`networkd`, `osd`, etc.) Service `routerd` listens of file
socket and routes requests to file sockets.

Service `apid` now does single task as well:

* it either fans out request to other `apid` services running on other
nodes and aggregates responses
* or it forwards requests to local `routerd` (when request destination
is local node)

Cons:

* one more proxying layer on request path

Pros:

* more clear service roles
* `routerd` is part of core Talos, services should register with it to
expose their API; no auth in the service (not exposed to the world)
* `apid` might be replaced with other implementation, it depends on TLS infra,
auth, etc.
* `apid` is better segregated from other Talos services (can only access
`routerd`, can't talk to other Talos services directly, so less exposure
in case of a bug)

This change is no-op to the end users.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-03-05 22:05:56 +03:00
Andrey Smirnov
bbe2c53d29 feat: generate kubeconfig on the fly on request
This extracts admin kubeconfig generation out of bootkube, now based on
Talos x509 library. On each API request for `kubeconfig`, config is
generated on the fly and sent back on the wire.

This fixes two issues:

* any master node can now generate `kubeconfig` (worker nodes can do
that too, but that should probably change in the future)
* after upgrade-and-wipe the disk scenario, `osctl kubeconfig` still
works

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-28 21:00:52 +03:00
Andrey Smirnov
e6dc87dfa4 chore: update pkgs & tools for Go 1.14
See also:

* https://github.com/talos-systems/tools/pull/89
* https://github.com/talos-systems/pkgs/pull/103

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-27 01:15:46 +03:00
Andrey Smirnov
923ef4537b test: implement new class of tests: provision tests (upgrades)
This class of tests is included/excluded by build tags, but as it is
pretty different from other integration tests, we build it as separate
executable. Provision tests provision cluster for the test run, perform
some actions and verify results (could be upgrade, reset, scale up/down,
etc.)

There's now framework to implement upgrade tests, first of the tests
tests upgrade from latest 0.3 (0.3.2 at the moment) to current version
of Talos (being built in CI). Tests starts by booting with 0.3
kernel/initramfs, runs 0.3 installer to install 0.3.2 cluster, wait for
bootstrap, followed by upgrade to 0.4 in rolling fashion. As Firecracker
supports bootloader, this boots 0.4 system from boot disk (as installed
by installer).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-21 07:04:03 -08:00
Andrey Smirnov
fae5e6915d chore: rework firecracker code around upstream Go SDK + PRs
This removes use of private fork with custom `ip=` kernel argument
handling and switches fully to upstream version of it.

Firecracker Go SDK version is `master` + following PRs:

* https://github.com/firecracker-microvm/firecracker-go-sdk/pull/167
* https://github.com/firecracker-microvm/firecracker-go-sdk/pull/177
* https://github.com/firecracker-microvm/firecracker-go-sdk/pull/178

MTU handling support was implemented as well.

Changes:

* hostname to each node is passed via `talos.hostname=` kernel arg
* IP configuration is generated by SDK from CNI result
* fixed bugs with wrong netmask
* nameservers & MTU is passed via Talos config

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-01-29 02:35:15 +03:00
Andrey Smirnov
9da687d2a3 test: firecracker provisioner fixes, implement cluster destroy
This implements `osctl cluster destroy` for Firecracker, adds
new utility command `osctl cluser show`.

Firecracker mode now has control process for firecracker VMs, allowing
clean reboots and background operations.

Lots of small fixes to Firecracker mode, clean CNI shutdown, cleaning up
netns, etc.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-01-21 17:11:06 -08:00
Spencer Smith
67e50f6f50 feat: allow for bootkube images to be customized
This PR allows for pod checkpointer and coredns images to be customized
for bootkube. We can already customize the hyperkube image and all other
images used by bootkube are CNI-related and can be customized with the
"custom" CNI setup.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-01-21 11:17:28 -08:00
Spencer Smith
60260c85d1 feat: upgrade kubernetes version to 1.17.1
This PR will bring in the latest point release of k8s 1.17

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-01-17 09:39:26 -08:00
Andrey Smirnov
2bf8540855 test: provision Talos clusters via Firecracker VMs
This is initial PR to push the initial code, it has several known
problems which are going to be addressed in follow-up PRs:

1. there's no "cluster destroy", so the only way to stop the VMs is to
`pkill firecracker`

2. provisioner creates state in `/tmp` and never deletes it, that is
required to keep cluster running when `osctl cluster create` finishes

3. doesn't run any controller process around firecracker to support
reboots/CNI cleanup (vethxyz interfaces are lingering on the host as
they're never cleaned up)

The plan is to create some structure in `~/.talos` to manage cluster
state, e.g. `~/.talos/clusters/<name>` which will contain all the
required files (disk images, file sockets, VM logs, etc.). This
directory structure will also work as a way to detect running clusters
and clean them up.

For point number 3, `osctl cluster create` is going to exec lightweight
process to control the firecracker VM process and to simulate VM reboots
if firecracker finishes cleanly (when VM reboots).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-01-16 00:27:08 +03:00
Andrew Rynhard
cb93646c07 fix: update kernel version constant
This needs to be updated for integrations tests.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-01-12 09:21:19 -08:00
Andrew Rynhard
7edd96947a feat: upgrade Linux to v5.4.10
This brings in the latest stable Linux.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-01-10 20:51:07 -08:00
Andrew Rynhard
4242acd085 feat: upgrade linux to v5.4.8
This brings in the latest 5.4 kernel.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-01-08 11:59:05 -06:00
Andrew Rynhard
e4a1bc3cf9 chore: add help menu to the Makefile
This adds a help  menu to the Makefile. It documents all build
dependencies, and how to get started.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-25 11:11:41 -08:00
Andrew Rynhard
907f87d8e0 feat: upgrade Linux to v5.4.5
This brings in the latest stable version of Linux.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-19 17:43:34 -08:00
Brad Beam
9584b47cd7 feat: Upgrade kubernetes to 1.17.0
Primarily doc/constant changes.

Added additionnal bits to `docs` target in makefile to generate osctl
docs as well as config files. Explicitly define a HOME variable so we
get consistent home directories for talosconfig variables in our docs.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-12-10 16:03:35 -08:00
Andrew Rynhard
fa515b8117 fix: kill POD network mode pods first on upgrades
When we upgrade a node, we kill off all pods before performing a fresh
install. The issue with this is that we run the risk of killing the CNI
pod before we finish killing all other pods, leaving the CRI unable to
teardown the pod's networking. This works around that by first killing
any pods running without host networking so that the CNI can do its'
job, and then removing the remaining pods.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-09 13:45:31 -08:00
Spencer Smith
92b5bd9b2b feat: allow ability to specify custom CNIs
This PR will allow users to specify one or many URLs for CNI so that
they can bypass bootkube deploying flannel and bring their own. Will
close #1593

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-12-06 15:27:36 -05:00
Andrew Rynhard
7b6a1fdc94 fix: update kernel version constant
This is required to pass integration tests.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-04 20:27:53 -08:00
Andrew Rynhard
d4c202438c refactor: set CRI config to /etc/cri/containerd.toml
This changes the CRI specific containerd instance's config to a
different path.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-04 19:32:00 -08:00
Andrew Rynhard
43e6703b8b feat: upgrade containerd to v1.3.2
This brings in the latest version of Containerd.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-04 10:19:51 -08:00
Andrew Rynhard
9745c3a504 fix: update kernel version constant
This is needed in order for integration tests to pass.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-02 15:26:28 -08:00
Andrey Smirnov
5b7bea2471 feat: use grpc-proxy in apid
This replaces codegen version of apid proxying with
talos-systems/grpc-proxy based version. Proxying is transparent, it
doesn't require exact information about methods and response types. It
requires some common layout response to enhance it properly with node
metadata or errors.

There should be no signifcant changes to the API with the previous
version, but it's worth mentioning a few changes:

1. grpc.ClientConn is established just once per upstream (either local
service or remote apid instance).

2. When called without `-t` (`targets`), apid proxies immediately down
to local service skipping proxying to itself (as before), which results
in empty node metadata in response (before it had local node IP). Might
revert this later to proxy to itself (?).

3. Streaming APIs are now fully supported with multiple targets, but
message definition doesn't contain `ResponseMetadata`, so streaming APIs
are broken now with targets (needs a fix).

4. Errors are now returned as responses with `Error` field set in
`ResponseMetadata`, this requires client library update and `osctl` to
handle it properly.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-11-29 22:57:25 +03:00
Andrew Rynhard
e78e1655f1 feat: upgrade packages
This brings in the following changes:

- Linux 5.3.13
- Containerd 1.3.1

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-11-25 10:41:47 -08:00
Andrey Smirnov
63212ab17e test: fix integration test for k8s version
Push versions to constants, introduce 'platform' to version API to
discover node mode. Check kernel version for non-containers.

A bit of refactoring on version package to expose something closer to a
single response.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-11-11 13:42:21 -08:00
Andrew Rynhard
17cce5468f feat: add metadata file to boot partition
This introduces the notion of metadata for a node. In this initial pass
there are only two fields. A timestamp to indicate when the install was
performed, and a field to indicate if the install was performed as part
of an upgrade.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-11-05 17:59:45 -08:00
Andrew Rynhard
5abbb9b041 fix: Avoid running bootkube on reboots
Since bootkube should only be ran once, we need a way to determine if it
has already been ran. This makes use of etcd to store a key-value pair
indicating that the cluster has been initialized.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-11-01 15:20:43 -07:00
Andrew Rynhard
3c6d0135d0 feat: upgrade Kubernetes to 1.16.2
This brings in 1.16.2 modules and bumps the default hyperkube image.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-30 06:35:12 -07:00
Brad Beam
457c6416a6 feat: Add network api to apid
This extends apid to include the network api

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-10-28 04:21:48 -07:00
Brad Beam
ee24e42319 feat: Add time api to apid
This extends apid to cover the time api.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-10-25 14:35:14 -07:00
Andrey Smirnov
d3d011c8d2 chore: replace /* */ comments with // comments in license header
This fixes issues with `// +build` directives not being recognized in
source files.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-25 14:15:17 -07:00
Brad Beam
573cce8d18 feat: Add APId
This PR introduces APId. This service replaces the frontend functionality
previously provided by OSD. The main driver for this is two fold:

1. Create a single purpose application to expose the talos api

2. Make use of code generation to DRY api changes

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-10-25 13:02:33 -05:00
Andrew Rynhard
10b6202c4f refactor: improve metal platform
This brings in a few minor improvements to the metal platform. The first
is to use talos.config=metal-iso to indicate that the machine's config
can be found in an ISO image. The second is a fix to ensure that /mnt
exists.

This adds support for creating more than one node using the qemu-boot.sh
script.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-14 22:05:56 -07:00
Andrew Rynhard
80e3876df5 feat: remove proxyd
We have decided that proxyd is not the best architectue for HA
Kubernetes. Our recommendation to users will be to create a load
balancer instead.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-14 08:11:00 -07:00
Brad Beam
d3f20db0aa fix: Use correct names for kubelet config
With the change to bootkube, kubelet.conf has changed names and is now kubelet-kubeconfig.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-10-11 07:42:32 -07:00
Andrey Smirnov
bb5f5cc754 chore: bump golangci-lint to 1.20
Memory usage reduced around 8-10x: now it stays stable at 1GB.

I disabled some of the new linters, and one rule which is violated a
lot.

I might make sense to go back and enable `wsl` fixing all the issues
(leaving that for another PR).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-09 22:21:08 +03:00
Andrew Rynhard
04313bd48c feat: add CNI, and pod and service CIDR to configurator
This adds more methods to the Cluster interface that allows for more
granular control of the cluster network settings.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-08 07:53:27 -07:00
Andrew Rynhard
b29391f0be feat: use bootkube for cluster creation
This replaces kubeadm with bootkube.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-07 17:17:57 -07:00
Andrew Rynhard
4ae8186107 feat: add configurator interface
This moves from translating a config into an internal config
representation, to using an interface. The idea is that an interface
gives us stronger compile time checks, and will prevent us from having to copy
from on struct to another. As long as a concrete type implements the
Configurator interface, it can be used to provide instructions to Talos.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-04 07:53:09 -07:00
Andrew Rynhard
e8dbf108e2 feat: add etcd service
This allows users to create an etcd service using the host init system.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-03 12:54:19 -07:00
Brad Beam
6038c4efe0 feat: Add kubeadm flex on etcd if service is enabled
This allows us to dynamically set in the kubeadm configuration an external etcd instance.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-10-01 13:33:52 -07:00
Andrew Rynhard
c44f7669e5 feat: allow Kubernetes version to be configured
This allows for users to specifify which version of Kubernetes to use.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-27 17:12:27 -07:00
Andrew Rynhard
6ec5cb02cb refactor: decouple grpc client and userdata code
This detangles the gRPC client code from the userdata code. The
motivation behind this is to make creating clients more simple and not
dependent on our configuration format.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-26 14:18:53 -07:00
Andrew Rynhard
607d68008c feat: use kubeadm to distribute Kubernetes PKI
This removes the trustd-based PKI distribution method in favor of
kubeadm's method.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-25 11:13:07 -07:00
Andrew Rynhard
f244673856 feat: write audit policy instead of using trustd
This changes the controlplane logic to write the audit policy to disk
from a common template instead of using trustd to distribute it.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-25 10:12:31 -07:00
Andrew Rynhard
82c706a0fb feat: upgrade Kubernetes to v1.16.0
Brings in Kubernetes v1.16.0.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-19 20:19:29 -07:00
Andrew Rynhard
21670978ca fix: log system services to /run/system/log
Writing system logs to /var/log breaks upgrades. The system disk unmount
fails with EBUSY. For now we can log to /run/system/log to avoid this.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-17 07:54:01 -07:00