This change introduces the ability for the Kubelet to monitor and report
the health of devices allocated via Dynamic Resource Allocation (DRA).
This addresses a key part of KEP-4680 by providing visibility into
device failures, which helps users and controllers diagnose pod failures.
The implementation includes:
- A new `v1alpha1.NodeHealth` gRPC service with a `WatchResources`
stream that DRA plugins can optionally implement.
- A health information cache within the Kubelet's DRA manager to track
the last known health of each device and handle plugin disconnections.
- An asynchronous update mechanism that triggers a pod sync when a
device's health changes.
- A new `allocatedResourcesStatus` field in `v1.ContainerStatus` to
expose the device health information to users via the Pod API.
Update vendor
KEP-4680: Fix lint, boilerplate, and codegen issues
Add another e2e test, add TODO for KEP4680 & update test infra helpers
Add Feature Gate e2e test
Fixing presubmits
Fix var names, feature gating, and nits
Fix DRA Health gRPC API according to review feedback
This is a complete revamp of the original API. Some of the key
differences:
- refocused on structured parameters and allocating devices
- support for constraints across devices
- support for allocating "all" or a fixed amount
of similar devices in a single request
- no class for ResourceClaims, instead individual
device requests are associated with a mandatory
DeviceClass
For the sake of simplicity, optional basic types (ints, strings) where the null
value is the default are represented as values in the API types. This makes Go
code simpler because it doesn't have to check for nil (consumers) and values
can be set directly (producers). The effect is that in protobuf, these fields
always get encoded because `opt` only has an effect for pointers.
The roundtrip test data for v1.29.0 and v1.30.0 changes because of the new
"request" field. This is considered acceptable because the entire `claims`
field in the pod spec is still alpha.
The implementation is complete enough to bring up the apiserver.
Adapting other components follows.
NodeResourceSlice will be used by kubelet to publish resource information on
behalf of DRA drivers on the node. NodeName and DriverName in
NodeResourceSlice must be immutable. This simplifies tracking the different
objects because what they are for cannot change after creation.
The new field in ResourceClass tells scheduler and autoscaler that they are
expected to handle allocation.
ResourceClaimParameters and ResourceClassParameters are new types for telling
in-tree components how to handle claims.
Now that they all call setup_env, we don't need find-binary (I think).
That was originally meant to hide the diff between docker and local
builds but all these tools do local builds anyway.
This makes "new" and "old" setup_env functions. In subsequent commits,
all callers of the "old" form will be fixed, and the "new" will be
renamed back.
The old and new functions diff:
```diff
--- /tmp/a 2023-12-14 09:02:57.804092696 -0800
+++ /tmp/b 2023-12-14 09:03:09.679999585 -0800
@@ -1,4 +1,4 @@
-kube::golang::old::setup_env() {
+kube::golang::new::setup_env() {
kube::golang::verify_go_version
# Set up GOPATH. We have tools which depend on being in a GOPATH (see
@@ -7,9 +7,9 @@
# Even in module mode, we need to set GOPATH for `go build` and `go install`
# to work. We build various tools (usually via `go install`) from a lot of
# scripts.
- # * We can't set GOBIN because that does not work on cross-compiles.
- # * We could use `go build -o <something>`, but it's subtle when it comes
- # to cross-compiles and whether the <something> is a file or a directory,
+ # * We can't just set GOBIN because that does not work on cross-compiles.
+ # * We could always use `go build -o <something>`, but it's subtle wrt
+ # cross-compiles and whether the <something> is a file or a directory,
# and EVERY caller has to get it *just* right.
# * We could leave GOPATH alone and let `go install` write binaries
# wherever the user's GOPATH says (or doesn't say).
@@ -20,16 +20,6 @@
#
# Eventually, when we no longer rely on run-in-gopath.sh we may be able to
# simplify this some.
- local go_pkg_dir="${KUBE_GOPATH}/src/${KUBE_GO_PACKAGE}"
- local go_pkg_basedir
- go_pkg_basedir=$(dirname "${go_pkg_dir}")
-
- mkdir -p "${go_pkg_basedir}"
-
- # TODO: This symlink should be relative.
- if [[ ! -e "${go_pkg_dir}" || "$(readlink "${go_pkg_dir}")" != "${KUBE_ROOT}" ]]; then
- ln -snf "${KUBE_ROOT}" "${go_pkg_dir}"
- fi
export GOPATH="${KUBE_GOPATH}"
# If these are not set, set them now. This ensures that any subsequent
@@ -40,24 +30,10 @@
# Make sure our own Go binaries are in PATH.
export PATH="${KUBE_GOPATH}/bin:${PATH}"
- # Change directories so that we are within the GOPATH. Some tools get really
- # upset if this is not true. We use a whole fake GOPATH here to collect the
- # resultant binaries.
- local subdir
- subdir=$(kube::realpath . | sed "s|${KUBE_ROOT}||")
- cd "${KUBE_GOPATH}/src/${KUBE_GO_PACKAGE}/${subdir}" || return 1
-
- # Set GOROOT so binaries that parse code can work properly.
- GOROOT=$(go env GOROOT)
- export GOROOT
-
# Unset GOBIN in case it already exists in the current session.
# Cross-compiles will not work with it set.
unset GOBIN
- # This seems to matter to some tools
- export GO15VENDOREXPERIMENT=1
-
- # Disable workspaces
- export GOWORK=off
+ # Explicitly turn on modules.
+ export GO111MODULE=on
}
```
Result: `make` works for k/k:
```
$ make kubectl
+++ [1211 11:07:31] Building go targets for linux/amd64
k8s.io/kubernetes/cmd/kubectl (static)
$ make WHAT=./cmd/kubectl/
+++ [1211 11:08:19] Building go targets for linux/amd64
k8s.io/kubernetes/./cmd/kubectl/ (non-static)
$ make WHAT=k8s.io/kubernetes/cmd/kubectl
+++ [1211 11:08:52] Building go targets for linux/amd64
k8s.io/kubernetes/cmd/kubectl (static)
```
Result: `make` works for staging by package:
```
$ make WHAT=k8s.io/api
+++ [1211 11:11:37] Building go targets for linux/amd64
k8s.io/api (non-static)
```
Result: `make` fails for staging by path:
```
$ make WHAT=./staging/src/k8s.io/api
+++ [1211 11:12:44] Building go targets for linux/amd64
k8s.io/kubernetes/./staging/src/k8s.io/api (non-static)
cannot find module providing package k8s.io/kubernetes/staging/src/k8s.io/api: import lookup disabled by -mod=vendor
(Go version in go.work is at least 1.14 and vendor directory exists.)
!!! [1211 11:12:44] Call tree:
!!! [1211 11:12:44] 1: /home/thockin/src/kubernetes/hack/lib/golang.sh:850 kube::golang::build_some_binaries(...)
!!! [1211 11:12:44] 2: /home/thockin/src/kubernetes/hack/lib/golang.sh:1012 kube::golang::build_binaries_for_platform(...)
!!! [1211 11:12:44] 3: hack/make-rules/build.sh:27 kube::golang::build_binaries(...)
!!! [1211 11:12:44] Call tree:
!!! [1211 11:12:44] 1: hack/make-rules/build.sh:27 kube::golang::build_binaries(...)
!!! [1211 11:12:44] Call tree:
!!! [1211 11:12:44] 1: hack/make-rules/build.sh:27 kube::golang::build_binaries(...)
make: *** [Makefile:96: all] Error 1
```
Result: `make test` fails:
```
$ make test WHAT=./cmd/kubectl
+++ [1211 11:13:38] Set GOMAXPROCS automatically to 6
+++ [1211 11:13:38] Running tests without code coverage and with -race
cmd/kubectl/kubectl.go:25:2: cannot find package "k8s.io/client-go/plugin/pkg/client/auth" in any of:
/home/thockin/src/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/client-go/plugin/pkg/client/auth (vendor tree)
/home/thockin/sdk/gotip/src/k8s.io/client-go/plugin/pkg/client/auth (from $GOROOT)
/home/thockin/src/kubernetes/_output/local/go/src/k8s.io/client-go/plugin/pkg/client/auth (from $GOPATH)
cmd/kubectl/kubectl.go:20:2: cannot find package "k8s.io/component-base/cli" in any of:
/home/thockin/src/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/component-base/cli (vendor tree)
/home/thockin/sdk/gotip/src/k8s.io/component-base/cli (from $GOROOT)
/home/thockin/src/kubernetes/_output/local/go/src/k8s.io/component-base/cli (from $GOPATH)
cmd/kubectl/kubectl.go:21:2: cannot find package "k8s.io/kubectl/pkg/cmd" in any of:
/home/thockin/src/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/kubectl/pkg/cmd (vendor tree)
/home/thockin/sdk/gotip/src/k8s.io/kubectl/pkg/cmd (from $GOROOT)
/home/thockin/src/kubernetes/_output/local/go/src/k8s.io/kubectl/pkg/cmd (from $GOPATH)
cmd/kubectl/kubectl.go:22:2: cannot find package "k8s.io/kubectl/pkg/cmd/util" in any of:
/home/thockin/src/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/kubectl/pkg/cmd/util (vendor tree)
/home/thockin/sdk/gotip/src/k8s.io/kubectl/pkg/cmd/util (from $GOROOT)
/home/thockin/src/kubernetes/_output/local/go/src/k8s.io/kubectl/pkg/cmd/util (from $GOPATH)
make: *** [Makefile:191: test] Error 1
```
GNU xargs has a `-r, --no-run-if-empty` option but I don't think we want
to depend on GNU (thanks, MacOS).
Why? Sometimes, when you are messing with codegens, you end up with an
empty input and then it just hangs.