Commit Graph

62 Commits

Author SHA1 Message Date
Andrey Smirnov
01d696ed10 chore: update golangci-lint-1.23.3
`gomnd` disabled, as it complains about every number used in the code,
and `wsl` became much more thorough.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-04 08:56:39 -08:00
Andrey Smirnov
afa8a48174 chore: implement reboot test
Reboot test does node-by-node reboots followed by cluster health checks
(same as done by provisioner).

Fixed bug with `Read()` returning `Reader` instead of `ReadCloser`
(minor).

Allowed `bootkube` to be `Skipped` (for rebooted node).

Added support for doing checks via provided client instance.

Implemented generic capabilities to skip tests based on cluster
platform.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2020-02-03 11:02:43 -08:00
Andrew Rynhard
f3623d22b0 refactor: use tls.Config as client credentials
The `client.Creds` struct was not used very often, and made using the
`client.NewClient` function impossible to use in combination with the
`RemoteRenewingFileCertificateProvider`. This modifies
`client.NewClient` to accept a `tls.Config` instead of `client.Creds`,
allowing for the use of `RemoteRenewingFileCertificateProvider` with
`client.NewClient`.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2020-01-21 17:10:07 -08:00
Andrey Smirnov
0081ac5fac refactor: extract Talos cluster provisioner as common code
This extracts Docker Talos cluster provisioner as common code
which might be shared between `osctl cluster` and integration-test.

There should be almost no functional changes.

As proof of concept, abstract cluster readiness checks were implemented
based on provisioned cluster state. It implements same checks as
`basic-integration.sh` in pure Go via Talos/K8s clients.

`conditions` package was promoted from machined-internal to
`internal/pkg` as it is used to run the checks.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-27 12:14:19 -08:00
Andrey Smirnov
6e05dd70c4 feat: add support for tailing logs
Fixes #1564

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-17 22:35:47 +03:00
Andrey Smirnov
1fbf40796f feat: implement streaming mode of dmesg, parse messages
Fixes #1563

This implements dmesg reading via `/dev/kmsg`, with message parsing and
formatting. Kernel log facility and severity are parsed, timestamp is
calculated relative to boot time (it's accurate unless time jumps a
lot during node lifetime).

New flags to follow dmesg was added, tail flag allows to stream only new
message (ignoring old messages). We could try to implement tailing last
N messages, just a bit more work, open to suggestions (for symmetry with
regular logs).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-16 17:40:15 +03:00
Andrew Rynhard
ad863a7f92 refactor: rename protobuf services, RPCs, and messages
This PR brings our protobuf files into conformance with the protobuf
style guide, and community conventions. It is purely renames, along with
generated docs.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-12-11 11:41:40 -08:00
Andrey Smirnov
399aeda0b9 feat: rename confusing target options, --endpoints, etc.
Fixes #1610

1. In `talosconfig`, deprecate `Target` in favor of `Endpoints`
(client-side LB to come next).

2. In `osctl`, use `--nodes` in place of `--target`.

3. In `osctl` add option `--endpoints` to override `Endpoints` for the
call.

Other changes are just updates to catch up with the changes. Most
probably I missed something... And CAPI provider needs update.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-10 02:23:54 +03:00
Andrey Smirnov
3a93e65b54 feat: make osd.Dmesg API streaming
This is to prepare for upcoming switch to reading `/dev/kmsg` which
should allow following logs, doing some kind of tail, etc.

The output is far from being perfect, as `dmesg` data is delivered as
single chunk (not as lines), but once server side updates, client side
should match it.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-09 23:52:35 +03:00
Andrey Smirnov
e907507aa6 fix: response filtering for client API, RunE for osctl
There are several changes which cleanup and address features of osctl,
mostly for multi-node requests:

* responses are filtered, so that client commands can print partial
failures/success responses;
* `RunE` is used in place of `Run` to propagate correct return sequence
on failures;
* cleaned up setting `targets` metadata on outgoing requests, it is set
by default in `globalCtx` already

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-09 11:24:58 -08:00
Andrey Smirnov
b1d282adf3 fix: error reporting in osctl kubeconfig
Problem seems to be on multiple levels, and there are a bit of changes
which got mixed in from another PR (just same file changed).

Core of the issue is that `helpers.Fatalf()` calls `os.Exit()` which
terminates execution and doesn't let the `defer` and other handlers to
run. This uses Cobra feature of error propagation to pop errors through
the stack back to root command.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-06 10:09:28 -08:00
Andrey Smirnov
edb40437ec feat: add support for osctl logs -f
Now default is not to follow the logs (which is similar to `kubectl logs`).

Integration test was added for `Logs()` API and `osctl logs` command.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-05 13:58:52 -08:00
Andrey Smirnov
10a40a15d9 fix: extract errors from API response
This PR only touches `Version` method, but I will expand it to other
methods in the next PR.

When proxying to many upstreams, errors are wrapped as responses as we
can't return error and response from grpc call. Reflect-based function
was introduced to filter out responses which contain errors as
multierror. Reflection was used, as each response is a different Go
type, and we can't write a generic function for it.

osctl was updated to support having both resp & err not nil. One failed
response shouldn't result in error.

Re-enabled integration test for multiple targets and version
consistency, need e2e validation.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-05 09:44:10 -08:00
Andrey Smirnov
fc52025490 fix: provide peer remote address for 'NODE': as default in osctl
This change is pretty mechanical, just wrap every API so that remote
peer address is used as default for `resp.Metadata.Hostname`.

This makes `NODE:` non-empty in all the API calls.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-12-05 00:11:55 +03:00
Andrey Smirnov
5b7bea2471 feat: use grpc-proxy in apid
This replaces codegen version of apid proxying with
talos-systems/grpc-proxy based version. Proxying is transparent, it
doesn't require exact information about methods and response types. It
requires some common layout response to enhance it properly with node
metadata or errors.

There should be no signifcant changes to the API with the previous
version, but it's worth mentioning a few changes:

1. grpc.ClientConn is established just once per upstream (either local
service or remote apid instance).

2. When called without `-t` (`targets`), apid proxies immediately down
to local service skipping proxying to itself (as before), which results
in empty node metadata in response (before it had local node IP). Might
revert this later to proxy to itself (?).

3. Streaming APIs are now fully supported with multiple targets, but
message definition doesn't contain `ResponseMetadata`, so streaming APIs
are broken now with targets (needs a fix).

4. Errors are now returned as responses with `Error` field set in
`ResponseMetadata`, this requires client library update and `osctl` to
handle it properly.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-11-29 22:57:25 +03:00
Andrew Rynhard
ac089dc330 feat: add read API
This adds an API for reading arbitrary files.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-11-25 10:46:50 -08:00
Sekerin Evgeniy
83d5f4c721 feat: Add context key to osctl
Added context key for change context on osctl

Signed-off-by: Sekerin Evgeniy <sekerin.e.a@gmail.com>
2019-11-13 11:32:15 -08:00
Brad Beam
531e7d8144 feat: Add meminfo api
Add ability to retrieve node memory stats ( /proc/meminfo ).

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-11-10 21:02:43 -06:00
Andrey Smirnov
cdda81df66 test: add k8s integration tests
Once again, mostly groundwork and one simple test for node versions.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-11-06 17:08:44 -08:00
Brad Beam
41a4741bca refactor: Move logs to machined
This moves Logs endpoint to machined to reduce the mount footprint of osd.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-11-04 15:04:13 -08:00
Brad Beam
a4e1479b07 refactor: Move kubeconfig to machined
This moves the Kubeconfig api endpoint to machined and consolidates the
"read a file" code into machined. This also changes Kubeconfig to
use the CopyOut method which changes Kubeconfig to a streaming grpc call.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-11-04 14:45:23 -08:00
Brad Beam
3fd8abf426 chore: Move data messages to common proto
This is to allows reuse across multiple apis.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-11-04 14:24:41 -06:00
Brad Beam
457c6416a6 feat: Add network api to apid
This extends apid to include the network api

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-10-28 04:21:48 -07:00
Brad Beam
ee24e42319 feat: Add time api to apid
This extends apid to cover the time api.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-10-25 14:35:14 -07:00
Andrey Smirnov
d3d011c8d2 chore: replace /* */ comments with // comments in license header
This fixes issues with `// +build` directives not being recognized in
source files.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-25 14:15:17 -07:00
Brad Beam
251ab16e07 feat: Add node metadata wrapper to machine api
- Added common.proto to host NodeMetadata
- go_package names were fixed up so imports are generated with the proper
  package names
- fixed up build work (dockerfile) to prevent copying the previously
  generated go proto files. This fixes a bug where we could incorrectly
  copy the previously generated protobuf instead of a new one generated
  at an incorrect location/name/etc.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-10-22 14:42:34 -05:00
Brad Beam
e6bf92ce31 feat(osd): Enable hitting multiple OSD endpoints
This enables the ability to specify additional <talos> endpoints to connect to
to pull back data.

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-10-16 15:30:25 -05:00
Andrew Rynhard
d430a37e46 refactor: use go 1.13 error wrapping
This removes the github.com/pkg/errors package in favor of the official
error wrapping in go 1.13.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-10-15 22:20:50 -07:00
Andrey Smirnov
c2cb0f9778 chore: enable 'wsl' linter and fix all the issues
I wish there were less of them :)

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-10 01:16:29 +03:00
Andrey Smirnov
bb5f5cc754 chore: bump golangci-lint to 1.20
Memory usage reduced around 8-10x: now it stays stable at 1GB.

I disabled some of the new linters, and one rule which is violated a
lot.

I might make sense to go back and enable `wsl` fixing all the issues
(leaving that for another PR).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-10-09 22:21:08 +03:00
Andrew Rynhard
6ec5cb02cb refactor: decouple grpc client and userdata code
This detangles the gRPC client code from the userdata code. The
motivation behind this is to make creating clients more simple and not
dependent on our configuration format.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-26 14:18:53 -07:00
Andrew Rynhard
9ffa064a70 feat: return a struct for processes RPC
This makes working with the API much cleaner as a client. Using gob
doesn't give the client a well-known type to work with in the API
definition.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-20 16:18:05 -07:00
Andrew Rynhard
3a92537a30 refactor: rename RPCs
The following RPCs have been renamed:

- ps to containers
- top to processes
- df to mounts

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-20 14:33:51 -07:00
Andrew Rynhard
9230ff4e35 feat: return a data structure in version RPC
A byte slice is not very useful. Having a struct with fields makes for a
better experience.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-19 16:58:07 -07:00
Andrew Rynhard
6efd6fbe08 chore: move gRPC API to public
In order for other projects to make use of our APIs, they must not
reside underneath the internal directory. This moves the protobuf
definitions to a top-level "api" directory and scopes them according to
their domain. This change also removes generated code from the gitignore
file so that users don't have to generate the code themseleves.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-09-19 08:55:13 -07:00
Andrey Smirnov
b68e6395d8 feat(machined): filter actions stop/start/restart on per-service level
This implements 'default deny' policy for service operations via the
API: services do not allow operations.

Service whitelists itself for stop/start/restart by implementing the
interface and returning boolean flag which might depend on userdata.

Machined APIs `Stop/Start` were renamed to `ServiceStop`/`ServiceStart`
to avoid confusion with osd API `Restart` which is not related to
services. Old APIs are deprecated and compatibility code forwards old
APIs to the new code.

`ServiceRestart` API was introduced to distinguish restart action from
stop/start (previously restart was implemented as stop+start in the
CLI).

Service udevd-trigger was whitelisted for all operations (allows
stopping hanging run, restarting to trigger once again).

Services proxyd & ntpd were whitelisted for restart and start (start is
whitelisted to help with service stuck in stopped state while restarting).

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-09-13 00:38:19 +03:00
Andrew Rynhard
d89b199825 chore: change upgrade request "url" to "image"
This aligns the nomenclature used throughout the codebase.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-27 21:43:20 -07:00
Brad Beam
692571bdec feat(networkd): Add grpc endpoint
Allows us to list routes and interface details

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-25 19:48:08 -07:00
Brad Beam
d36007fb29 feat(osd): Add ntpd client
Allows us to access ntp api

Signed-off-by: Brad Beam <brad.beam@talos-systems.com>
2019-08-25 13:38:34 -07:00
Seán C McCord
d0ff28a8c7 fix: enclose server address is bracks if IPv6
Fixes #980

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2019-08-10 17:42:17 -07:00
Andrew Rynhard
90c91807bd refactor: restructure the project layout
This change moves packages into more appropriate places.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-08-01 22:19:42 -07:00
Andrey Smirnov
9c63f4ed0a feat(init): implement complete API for service lifecycle (start/stop)
It is now possible to `start`/`stop`/`restart` any service via `osctl`
commands.

There are some changes in `ServiceRunner` to support re-use (re-entering
running state). `Services` singleton now tracks service running state to
avoid calling `Start()` on already running `ServiceRunner` instance.
Method `Start()` was renamed to `LoadAndStart()` to break up service
loading (adding to the list of service) and actual service start.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-08-01 11:16:57 -07:00
Andrew Rynhard
b4383e35db feat: move df API to init
This change allows for more accurate mount reporting as /proc/mounts is
a symlink to /proc/self/mounts and contains mounts that are relative to
the running process. In our case this was osd. This caused inaccurate
reporting of mounts since they were relative to osd when we really
wanted mounts relative to machined.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-24 15:28:37 -07:00
Andrew Rynhard
8e8aae98dd feat: add machined
This commit splits our current init into init and machined.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-16 13:12:21 -07:00
Andrew Rynhard
5d8ee0a3a5 fix: use existing logic to perform reset
This PR moves the reset API to the init API definition.
It leverages the same code we use for upgrades.

Signed-off-by: Andrew Rynhard <andrew@andrewrynhard.com>
2019-07-04 18:26:14 -07:00
Andrey Smirnov
237e903f91 feat(osd): implement CRI inspector for containers (#817)
Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-07-02 15:48:00 -07:00
Andrey Smirnov
76071abbb8 feat(init): move 'ls' API to init from osd (#755)
Service `osd` doesn't have access to rootfs, as it is running in a
container, so move API to `init` which has unconstrained access to
rootfs. (This is in line with another API, `osctl cp`).

Fixes: #752

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-06-21 22:29:39 +03:00
Andrey Smirnov
9ed45f7090 feat(osctl): implement 'cp' to copy files out of the Talos node (#740)
Actual API is implemented in the `init`, as it has access to root
filesystem. `osd` proxies API back to `init` with some tricks to support
grpc streaming.

Given some absolute path, `init` produces and streams back .tar.gz
archive with filesystem contents.

`osctl cp` works in two modes. First mode streams data to stdout, so
that we can do e.g.: `osctl cp /etc - | tar tz`. Second mode extracts
archive to specified location, dropping ownership info and adjusting
permissions a bit. Timestamps are not preserved.

If full dump with owner/permisisons is required, it's better to stream
data to `tar xz`, for quick and dirty look into filesystem contents
under unprivileged user it's easier to use in-place extraction.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-06-20 17:02:58 -07:00
Andrey Smirnov
0c0a0340b2 fix(osctl): allow '-target' flag for osctl restart (#732)
I couldn't find any use for the `timeout` flag nor the value passed in
the API, but it block much more useful and present in other commands
flag 'target'.

Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
2019-06-14 21:37:57 +03:00
Seán C. McCord
532a53bfaf feat(init): Implement 'ls' command (#721)
Fixes #719

Signed-off-by: Seán C McCord <ulexus@gmail.com>
2019-06-07 10:19:20 -07:00