During kubeadm join in 1.30 kubeadm started respecting
the kubeletconfiguration healthz address/port. Previously
it hardcoded the health check to localhost:defaultport.
A corner case was not handled where the user applies --patches
on join to modify the local kubeletconfiguration. This results
in kubeletconfiguration patch target patches not being applied to
the KubeletConfiguration in memory and the health check
running on the address:port which are present in the kubelet-config
configmap.
Fix that by explicitly calling a new function to patch the
KubeletConfiguration in memory. This is scoped to only handle
the healthz checks *after* the kubelet config.yaml was already
patched and written to disk.
Currently, there are some unit tests that are failing on Windows due
to various reasons:
- IPVS proxy mode is not supported on Windows.
- pkg/kubelet/cri/remote was moved to cri-client.
v1beta3.ClusterConfiguration.APIServer.TimeoutForControlPlane
must be migrated to {Init|Join}Configuration.Timeouts.
.ControlPlaneComponentHealthCheck.
To achieve this sort of cross-Kind migration do the following:
- Use a temporary, thread-safe variable in timeoututils.go
- Make the order of GVKs in documentMapToInitConfiguration
deterministic.
When doing a kubelet health check on init/join, do not
hardcode the "localhost" address. Instead, use the
KubeletConfiguration HealthzBindAddress and HealthzPort
fields.
With the new `cri-client` staging repository it's finally possible to
decouple `kubeadm` from `crictl`.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
Add Upgrade{Apply|Node}Configuration.{ImagePullPolicy|ImagePullSerial}.
The same feature already exists in NodeRegistrationOptions for
{Init|Join}Configuration.
Allow the user to pass custom cert validity period with
ClusterConfiguration.CertificateValidityPeriod and
CACertificateValidityPeriod.
The defaults remain 1 year for regular cert and 10 years for CA.
Show warnings if the provided values are more than the defaults.
Additional changes:
- In "certs show-expiration" use HumanDuration() to print
more detailed durations instead of ShortHumanDuration().
- Add a new kubeadm util GetStartTime() which can be used
to consistently get a UTC time for tasks like writing certs
and unit tests.
- Update unit tests to validate the new customizable NotAfter.
The function KubernetesReleaseVersion is being called in
a number of locations during unit tests but by default it
uses a "fetch version from URL" approach.
- Update the function to return a placeholder version
during unit tests.
- Update unit tests for this function.
- Update strings / comments in other version_tests.go
locations.
The improvement is significant:
time go test k8s.io/kubernetes/cmd/kubeadm/app/... -count=1
before:
real 2m47.733s
after:
real 0m10.234s
StaticPodMirroringTimeout and StaticPodMirroringRetryInterval
are use for just an API call to get Pods(). The already existing
constants.KubernetesAPICallRetryInterval
and kubeadmapi.GetActiveTimeouts().KubernetesAPICall.Duration
can be used for that instead.
Follow the same process of adding the Timeouts struct
to UpgradeConfiguration similarly to how it was done for
other API Kinds.
In the Timeouts struct include one new timeout:
- UpgradeManifests
The idempotency.go (perhaps not so accurately named) contains
API calls that kubeadm does against an API server using client-go.
Some users seem to have unstable setups where for unknown reasons
the API server can be unavailable or refuse to respond as expected.
Use PollUntilContextTimeout in all exported functions to ensure
such API calls are all retry-able.
NOTE: The context passed to PollUntilContextTimeout is not propagated
in the polled function. Instead the poll function creates it's own
context 'ctx := context.Background()', this is to avoid
breaking expectations on the side of the callers, that expect
a certain type of error and not "context timeout" errors.
Additional changes:
- Make all context.TODO() -> context.Background()
- Update all unit tests and make sure during testing the retry
interval and timeout are short. Test coverage of idempotency.go
is at ~97%.
- Remove the TestMutateConfigMapWithConflict test. It does not
contribute much, because conflict handling is done at the API,
server side, not on the side of kubeadm. This simulating this is not
needed.
WaitForAllControlPlaneComponents is a new feature gate
that can be used to tell kubeadm to wait for all control plane
components and not only kube-apiserver.
- Add the Waiter function WaitForControlPlaneComponents
that waits for all CP components in parallel. Uses the regular
healthz endpoint for checks of status 200.
- Add a new experimental phase to kubeadm join called "wait-control-plane".
A similar phase exists for kubeadm init.
Previous v1beta4 work added support for
ClusterConfiguration.EncryptionAlgorithm, however the possible
values were limited to just "RSA" (2048 key size) and "ECDSA" (P256).
Allow more arbitrary algorithm types, that can also include key size
or curve type encoded in the name:
"RSA-2048" (default), "RSA-3072", "RSA-4096" or "ECDSA-P256".
Update the deprecation notice of the PublicKeysECDSA FeatureGate
as ideally it should be removed only after v1beta3 is removed.
- Update the logic in checks.go to separate serial and parallel image
pulls.
- Add a new CRI function PullImagesInParallel() with a private
implementation.
- Unit test the private implementation.
- Update other unit tests in checks_test.go.
Intentionally pass a new context to this API call.
This will let the API call run independently of the parent
context timeout, which is quite short and can cause the API
call to return abruptly.
If some code is about to go over the context deadline,
"x/time/rate/rate.go" would return and untyped error with the string
"would exceed context deadline". If some code already exceeded
the deadline the error would be of type DeadlineExceeded.
Ignore such context errors and only store API and connectivity errors.
- Name the function GetConfigMapWithShortRetry to be
easier to understand that the function is with a very short timeout.
Add note that this function should be used in cases there is a
fallback to local config.
- Apply custom hardcoded interval of 50ms and timeout of 350ms to it.
Previously the fucntion used exp backoff with 5 steps up to ~340ms.
Switch to PollUntilContextTimeout() everywhere to allow
usage of the exposed timeouts in the kubeadm API. Exponential backoff
options are more difficult to expose in this regard and a bit too
detailed for the common user - i.e. have "steps", "factor" and so on.