Commit Graph

216 Commits

Author SHA1 Message Date
Andrew Rynhard
0051a43aee docs: improve CLI menu and metal docs
This addresses a few common points of confusion for new users.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-04-21 10:28:00 -07:00
Andrew Rynhard
4ccd4d5364 fix: set ephemeral partition to max size
This sets the size of the ephemeral partition to the maximum
allowed size at installation time. We have reports of `xfs_growfs` causing
extremely slow boot times when the disk is 1TB or more. In our research
we found evidence that `xfs_growfs` is an expensive operation when
growing to a size of 10 times or more of the base. Instead, users should
create the disk close to the max disk size at install time. The
difference being that `mkfs.xfs` will handle larger disks better.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-04-17 07:08:04 -07:00
Spencer Smith
8d2f8d6127 chore: remove random.trust_cpu references
This PR removes the references to adding in the random CPU trust to the
kernel for all v0.4 docs, as well as in the iso command in the
installer. This is no longer needed with the newer linux kernel.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-04-14 17:10:56 -07:00
Andrey Smirnov
5255883034 fix: make sure Close() is called on every path
For some places `.Close()` was clearly missing, for some of them I wanted
to be 200% sure it gets called on every code path.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-04-03 19:16:01 -04:00
Andrey Smirnov
682dd433ba refactor: move Talos client package to pkg/
As this implements Go client for Talos API, it makes sense to publish it
one the top level.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-04-01 23:45:58 +03:00
Andrew Rynhard
47327eca09 fix: move empty label check
We should always set the fallback tag on an upgrade, and only revert if
the tag value is not an empty string.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-03-30 13:42:08 -07:00
Spencer Smith
b84d5e2660 feat: allow for exposing ports on docker clusters
This PR will introduce a `-p/--exposed-ports` flag to talosctl. This
flag will allow us to enable port forwards on worker nodes only. This
will allow for ingresses on docker clusters so we can hopefully use
ingress for Arges initial bootstrapping. I modeled this after how KIND allows ingresses
[here](https://kind.sigs.k8s.io/docs/user/ingress/)

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-03-30 15:24:25 -04:00
Andrew Rynhard
6fe5fed6f9 fix: make upgrades work with UEFI
Since the `--once` option of `extlinux` seems to only work with BIOS, we
needed to change to remove any reliance on this option. Instead of
booting the upgraded version once, and then making it the default after
a successful boot, we now make it the default, and then revert on any
boot error.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-03-26 13:34:00 -07:00
Andrey Smirnov
104af4380e feat: make --wait default option to talosctl cluster create
It seems to be useful enough to be the default one and it prevents
simple mistakes while trying to access the cluster which is not ready
yet.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-03-25 06:36:43 -07:00
Andrew Rynhard
5dbc26c7a3 feat: rename osctl to talosctl
This is a rename of the osctl binary. We decided that talosctl is a
better name for the Talos CLI. This does not break any APIs, but does
make older documentation only accurate for previous versions of Talos.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-03-20 19:07:39 -07:00
Andrew Rynhard
69fa63a7b2 refactor: perform upgrade upon reboot
This PR introduces a new strategy for upgrades. Instead of attempting to
zap the partition table, create a new one, and then format the
partitions, this change will only update the `vmlinuz`, and
`initramfs.xz` being used to boot. It introduces an A/B style upgrade
process, which will allow for easy rollbacks. One deviation from our
original intention with upgrades is that this change does not completely
reset a node. It falls just short of that and does not reset the
partition table. This forces us to keep the current partition scheme in
mind as we make changes in the future, because an upgrade assumes a
specific partition scheme. We can improve upgrades further in the
future, but this will at least make them more dependable. Finally, one
more feature in this PR is the ability to keep state. This enables
single node clusters to upgrade since we keep the etcd data around.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-03-20 17:32:18 -07:00
Andrey Smirnov
564e9e3c00 feat: add support for --with-debug to osctl cluster create
This enables config option 'debug: yes' which redirects service logs to
console which helps debugging cases when API is not available.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-03-20 16:46:09 -07:00
Andrey Smirnov
0babc39653 feat: split osctl commands into Talos API and cluster management
This keeps backwards compatibility with `osctl` CLI binary with the
exception of `osctl config generate` which was renamed to `osctl
gen config` to avoid confusion with other `osctl config`
commands which operate on client config, not Talos server config.

Command implementation and helpers were split into subpackages for
cleaner code and more visible boundaries. The resulting binary still
combines commands from both sections into a single binary.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-03-20 22:45:04 +03:00
Spencer Smith
2f4ccfda9a fix: respect dns domain from machine config
BREAKING CHANGE: This PR fixes a bug where we were only passing `cluster.local` to the
kubelet configuration. It will also pull in a new version of the
bootkube fork to ensure that custom domains got propogated down to the
API Server certs, as well as the CoreDNS configuration for a cluster.

Existing users should be aware that, if they were previously trying to
use this option in machine configs, that an upgrade will may break
their cluster. It will update a kubelet flag with the new domain, but
CoreDNS and API Server certs will not change since bootkube has already
run. One option may be to change these values manually inside the
Kubernetes cluster. However, it may prove easier to rebuild the cluster
if necessary.

Additionally, this PR also exposes a flag to `osctl config generate`
to allow tweaking this domain value as well.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-03-20 12:28:17 -04:00
Spencer Smith
fa82454be4 chore: fix formatting of imports
This PR cleans up the formatting for various package imports as they
were causing the linter to throw errors.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-03-19 15:06:05 -04:00
Spencer Smith
12bfd8dd94 feat: allow for persistence of config data
This PR will allow users to set the `persist: true` value in their
config data to tell talos not to re-pull the config data at each reboot.
The default will still remain as a "pull every time" methodolgy in order
to encourage immutability by default.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-03-06 11:42:00 -05:00
Spencer Smith
856386a788 fix: ensure CA cert generation respects the hour flag
This PR fixes a bug with cert generation via `osctl gen ca`. The value
specified by the --hour flag was never being appended to the CA options
and also, since the default value for `hour` was being set on init, the
CA default was being overwritten by a subsequent command setup for the
CRT default (24h). This PR moves to using two distinct variables for
those hour values. Will fix #1911.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-03-02 15:34:20 -05:00
Andrey Smirnov
bbe2c53d29 feat: generate kubeconfig on the fly on request
This extracts admin kubeconfig generation out of bootkube, now based on
Talos x509 library. On each API request for `kubeconfig`, config is
generated on the fly and sent back on the wire.

This fixes two issues:

* any master node can now generate `kubeconfig` (worker nodes can do
that too, but that should probably change in the future)
* after upgrade-and-wipe the disk scenario, `osctl kubeconfig` still
works

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-28 21:00:52 +03:00
Andrew Rynhard
9cf217d2c1 fix: default reboot flag to false
We should default to shutting down when resetting.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-02-19 16:14:00 -08:00
Andrew Rynhard
64b5b32732 refactor: use go-procfs
This makes use of the external procfs pacakge that is based on the
pacakge we are removing here.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-02-19 15:58:57 -08:00
Andrew Rynhard
8a3a76f73e fix: add reboot flag to reset command
This exposes the reboot option for thee reset API by adding a `--reboot`
flag to the CLI.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-02-19 15:44:10 -08:00
Andrew Rynhard
fe7847e0b8 feat: add reboot flag to reset API
This adds the ability to automatically reboot a machine after a reboot.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-02-19 05:10:58 -08:00
Spencer Smith
8092362098 fix: fix reset command
This PR will fix the reset command to actually wipe the system disk as
expected.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-02-18 16:18:43 -05:00
Andrey Smirnov
638929f319 chore: remove KubernetesVersion from provision request
Not sure how it got into `ClusterRequest`, but we're not using it.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-17 13:29:50 -08:00
Andrey Smirnov
e1779ac77c feat: implement registry mirror & config for image pull
When images are pulled by Talos or via CRI plugin, configuration
for each registry is applied. Mirrors allow to redirect pull request to
either local registry or cached registry. Auth & TLS enable
authentication and TLS authentication for non-public registries.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-14 00:28:59 +03:00
Andrey Smirnov
33332f4c74 chore: support bootloader emulation in firecracker provisioner
Firecracker launches tries to open VM disk image before every boot,
parses partition table, finds boot partition, tries to read it as FAT32
filesystem, extracts uncompressed kernel from `bzImage` (firecracker
doesn't support `bzImage` yet), extracts initramfs and passes it to
firecracker binary.

This flow allows for extended tests, e.g. testing installer, upgrade and
downgrade tests, etc.

Bootloader emulation is disabled by default for now, can be enabled via
`--with-bootloader-emulation` flag to `osctl cluster create`.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-13 23:21:37 +03:00
Andrey Smirnov
76c2038b13 chore: implement loadbalancer for firecracker provisioner
This PR contains generic simple TCP loadbalancer code, and glue code for
firecracker provisioner to use this loadbalancer.

K8s control plane is passed through the load balancer, and Talos API is
passed only to the init node (for now, as some APIs, including
kubeconfig, don't work with non-init node).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-13 23:07:13 +03:00
Andrey Smirnov
01d696ed10 chore: update golangci-lint-1.23.3
`gomnd` disabled, as it complains about every number used in the code,
and `wsl` became much more thorough.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-04 08:56:39 -08:00
Andrey Smirnov
afa8a48174 chore: implement reboot test
Reboot test does node-by-node reboots followed by cluster health checks
(same as done by provisioner).

Fixed bug with `Read()` returning `Reader` instead of `ReadCloser`
(minor).

Allowed `bootkube` to be `Skipped` (for rebooted node).

Added support for doing checks via provided client instance.

Implemented generic capabilities to skip tests based on cluster
platform.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-03 11:02:43 -08:00
Andrey Smirnov
fae5e6915d chore: rework firecracker code around upstream Go SDK + PRs
This removes use of private fork with custom `ip=` kernel argument
handling and switches fully to upstream version of it.

Firecracker Go SDK version is `master` + following PRs:

* https://github.com/firecracker-microvm/firecracker-go-sdk/pull/167
* https://github.com/firecracker-microvm/firecracker-go-sdk/pull/177
* https://github.com/firecracker-microvm/firecracker-go-sdk/pull/178

MTU handling support was implemented as well.

Changes:

* hostname to each node is passed via `talos.hostname=` kernel arg
* IP configuration is generated by SDK from CNI result
* fixed bugs with wrong netmask
* nameservers & MTU is passed via Talos config

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-01-29 02:35:15 +03:00
Andrey Smirnov
9da687d2a3 test: firecracker provisioner fixes, implement cluster destroy
This implements `osctl cluster destroy` for Firecracker, adds
new utility command `osctl cluser show`.

Firecracker mode now has control process for firecracker VMs, allowing
clean reboots and background operations.

Lots of small fixes to Firecracker mode, clean CNI shutdown, cleaning up
netns, etc.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-01-21 17:11:06 -08:00
Andrew Rynhard
f3623d22b0 refactor: use tls.Config as client credentials
The `client.Creds` struct was not used very often, and made using the
`client.NewClient` function impossible to use in combination with the
`RemoteRenewingFileCertificateProvider`. This modifies
`client.NewClient` to accept a `tls.Config` instead of `client.Creds`,
allowing for the use of `RemoteRenewingFileCertificateProvider` with
`client.NewClient`.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-01-21 17:10:07 -08:00
Spencer Smith
6cf126dbca refactor: use ConfiguratorBundle interface for config generate
This PR overhauls osctl's config generate command to make use of the new
ConfigBundle implementation of the ConfiguratorBundle interface

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-01-17 15:50:19 -05:00
Andrey Smirnov
2bf8540855 test: provision Talos clusters via Firecracker VMs
This is initial PR to push the initial code, it has several known
problems which are going to be addressed in follow-up PRs:

1. there's no "cluster destroy", so the only way to stop the VMs is to
`pkill firecracker`

2. provisioner creates state in `/tmp` and never deletes it, that is
required to keep cluster running when `osctl cluster create` finishes

3. doesn't run any controller process around firecracker to support
reboots/CNI cleanup (vethxyz interfaces are lingering on the host as
they're never cleaned up)

The plan is to create some structure in `~/.talos` to manage cluster
state, e.g. `~/.talos/clusters/<name>` which will contain all the
required files (disk images, file sockets, VM logs, etc.). This
directory structure will also work as a way to detect running clusters
and clean them up.

For point number 3, `osctl cluster create` is going to exec lightweight
process to control the firecracker VM process and to simulate VM reboots
if firecracker finishes cleanly (when VM reboots).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-01-16 00:27:08 +03:00
Andrew Rynhard
898cf01f0a refactor: unify generate type and machine type
We have been using two packages that define a config type and a machine
type, when really they are one and the same. This unifies the types down
to one set.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-01-10 16:46:28 -08:00
Spencer Smith
d2c03dfb1a refactor: use an interface for config data
This PR will move to using a `ConfiguratorBundle` interface for our
various config data files. This will allow us to easily abstract away
various versions and easily get the data with functions.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-01-10 14:24:13 -05:00
Spencer Smith
4c5cd2bb5c refactor: use config struct instead of string
This PR will pass the configs around as structs instead of strings.
We'll be using this to do a further refactor of the cluster create
command and the configurator interface.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-01-09 08:47:09 -05:00
Spencer Smith
75d9f7b454 feat: support configurable docker-based clusters
This PR will allow users to issue `osctl config generate`, tweak the
configs to their liking, then use those configs to call `osctl cluster
create`.

Example workflow:

```
osctl config generate my-cluster https://10.5.0.2:6443 -o ./my-cluster

** tweaky tweak **

osctl cluster create --name my-cluster --input-dir "$PWD/my-cluster"
```

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2020-01-08 14:11:56 -05:00
Andrew Rynhard
815aa99cc4 fix: set the correct kernel args for VMware
This enusres that we default to using `guestinfo` for VMware's config
source, and that we use tty0 instead of ttyS0 for the console.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-01-01 08:21:09 -08:00
Andrew Rynhard
3f6a2cb7f7 fix: use the correct mf file name
This adds a variable to the mf template to render the mf file name
properly.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-30 18:53:25 -08:00
Andrey Smirnov
ebd40bd0eb chore: use osctl cluster --wait in basic-integration
There are few workarounds for Drone way of running integration test:
DinD runs as a separate pod, and we can only access its exposed on the
"host" ports, while from Talos cluster this endpoint is not reachable.

So internally Talos nodes still use addresses like "10.5.0.2", while
test is using "docker" to access it (that's name of the `docker` service
in the pipeline).

When running locally, 127.0.0.1 is used as endpoint, which should work
fine both on OS X and Linux.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-30 15:15:42 -08:00
Andrey Smirnov
0081ac5fac refactor: extract Talos cluster provisioner as common code
This extracts Docker Talos cluster provisioner as common code
which might be shared between `osctl cluster` and integration-test.

There should be almost no functional changes.

As proof of concept, abstract cluster readiness checks were implemented
based on provisioned cluster state. It implements same checks as
`basic-integration.sh` in pure Go via Talos/K8s clients.

`conditions` package was promoted from machined-internal to
`internal/pkg` as it is used to run the checks.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-27 12:14:19 -08:00
Andrew Rynhard
5a7eb631b2 feat: add installer command to installer container
This replaces the entrypoint.sh shell script with a go binary.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-26 06:41:25 -08:00
Andrew Rynhard
e4a1bc3cf9 chore: add help menu to the Makefile
This adds a help  menu to the Makefile. It documents all build
dependencies, and how to get started.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-25 11:11:41 -08:00
Andrew Rynhard
0fae1bc92d fix: fix output formats
This fixes random log issues.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-23 09:09:55 -08:00
Andrey Smirnov
3a021e4579 test: add integration tests for (most) CLI commands
I added tests for all the commands which work reliably in container mode.

Some tests are naive, some are more sophisticated. While going
through the tests, I think I found a small bug in `osctl gen keypair`.

When we get reliable KVM tests, I can revisit and add missing
tests for time, reboot, shutdown and friends.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-20 23:33:35 +03:00
Andrey Smirnov
26f222e6db refactor: osctl code cleanup, no functional changes
Fixes #1666

1. Remove custom validation of Args, use cobra-provided validators.
2. Always use errors to stop the execution flow, don't rely on
`log.Fatal` and `panic` for normal flows. This makes sure `defer` always
has a chance to run, connection is shut down in a clean way.
3. Command `docs` is hidden, as it's not for users.
4. Global variable `globalCtx` is removed, `WithClient` is used to pass
context to the command.
5. `setupClientE` renamed to `WithClient`, `setupClient` removed.
6. Code from `cmd/root.go` moved to `pkg/helpers` when possible.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-20 00:32:26 +03:00
Andrey Smirnov
53f1cda715 fix: update osctl list to report node name
For long format, node is reported always as first column.

For regular format, if `--nodes` wasn't specified, output goes as
before without node name, but if `--nodes` is used, output switches to
column format with node being first column.

Example:

```
$ osctl list /etc
.
ca-certificates
cni
cri
hostname
hosts
kubernetes
mtab
os-release
pki
resolv.conf
ssl
```

```
$ osctl list --nodes 10.5.0.2,10.5.0.3 /etc
NODE       NAME
10.5.0.2   .
10.5.0.2   ca-certificates
10.5.0.2   cni
10.5.0.3   .
10.5.0.3   ca-certificates
10.5.0.2   cri
10.5.0.2   hostname
10.5.0.2   hosts
10.5.0.2   kubernetes
10.5.0.2   mtab
10.5.0.2   os-release
10.5.0.2   pki
10.5.0.2   resolv.conf
10.5.0.2   ssl
10.5.0.3   cni
10.5.0.3   cri
10.5.0.3   hostname
10.5.0.3   hosts
10.5.0.3   kubernetes
10.5.0.3   mtab
10.5.0.3   os-release
10.5.0.3   pki
10.5.0.3   resolv.conf
10.5.0.3   ssl
```

List from multiple nodes is hard to consume as is not sorted neither by
node name nor by file name. This is not addressed in this PR.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-19 01:17:02 +03:00
Andrey Smirnov
de35b4d5af fix: issues discovered by lgtm tool
Using `SafePath` function from `runc` (but had to create local copy as
`runc` doesn't build on OS X).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-18 21:43:59 +03:00
Spencer Smith
47ae0148a2 fix: use dash for default talos cluster name in docker
This PR updates talos_default to talos-default so all docker container
names look the same, as well as avoiding the potential to break b/c
that's not a valid dns name.

Signed-off-by: Spencer Smith <robertspencersmith@gmail.com>
2019-12-18 12:43:38 -05:00