The function KubernetesReleaseVersion is being called in
a number of locations during unit tests but by default it
uses a "fetch version from URL" approach.
- Update the function to return a placeholder version
during unit tests.
- Update unit tests for this function.
- Update strings / comments in other version_tests.go
locations.
The improvement is significant:
time go test k8s.io/kubernetes/cmd/kubeadm/app/... -count=1
before:
real 2m47.733s
after:
real 0m10.234s
StaticPodMirroringTimeout and StaticPodMirroringRetryInterval
are use for just an API call to get Pods(). The already existing
constants.KubernetesAPICallRetryInterval
and kubeadmapi.GetActiveTimeouts().KubernetesAPICall.Duration
can be used for that instead.
Follow the same process of adding the Timeouts struct
to UpgradeConfiguration similarly to how it was done for
other API Kinds.
In the Timeouts struct include one new timeout:
- UpgradeManifests
The idempotency.go (perhaps not so accurately named) contains
API calls that kubeadm does against an API server using client-go.
Some users seem to have unstable setups where for unknown reasons
the API server can be unavailable or refuse to respond as expected.
Use PollUntilContextTimeout in all exported functions to ensure
such API calls are all retry-able.
NOTE: The context passed to PollUntilContextTimeout is not propagated
in the polled function. Instead the poll function creates it's own
context 'ctx := context.Background()', this is to avoid
breaking expectations on the side of the callers, that expect
a certain type of error and not "context timeout" errors.
Additional changes:
- Make all context.TODO() -> context.Background()
- Update all unit tests and make sure during testing the retry
interval and timeout are short. Test coverage of idempotency.go
is at ~97%.
- Remove the TestMutateConfigMapWithConflict test. It does not
contribute much, because conflict handling is done at the API,
server side, not on the side of kubeadm. This simulating this is not
needed.
WaitForAllControlPlaneComponents is a new feature gate
that can be used to tell kubeadm to wait for all control plane
components and not only kube-apiserver.
- Add the Waiter function WaitForControlPlaneComponents
that waits for all CP components in parallel. Uses the regular
healthz endpoint for checks of status 200.
- Add a new experimental phase to kubeadm join called "wait-control-plane".
A similar phase exists for kubeadm init.
Previous v1beta4 work added support for
ClusterConfiguration.EncryptionAlgorithm, however the possible
values were limited to just "RSA" (2048 key size) and "ECDSA" (P256).
Allow more arbitrary algorithm types, that can also include key size
or curve type encoded in the name:
"RSA-2048" (default), "RSA-3072", "RSA-4096" or "ECDSA-P256".
Update the deprecation notice of the PublicKeysECDSA FeatureGate
as ideally it should be removed only after v1beta3 is removed.
- Update the logic in checks.go to separate serial and parallel image
pulls.
- Add a new CRI function PullImagesInParallel() with a private
implementation.
- Unit test the private implementation.
- Update other unit tests in checks_test.go.
Intentionally pass a new context to this API call.
This will let the API call run independently of the parent
context timeout, which is quite short and can cause the API
call to return abruptly.
If some code is about to go over the context deadline,
"x/time/rate/rate.go" would return and untyped error with the string
"would exceed context deadline". If some code already exceeded
the deadline the error would be of type DeadlineExceeded.
Ignore such context errors and only store API and connectivity errors.
- Name the function GetConfigMapWithShortRetry to be
easier to understand that the function is with a very short timeout.
Add note that this function should be used in cases there is a
fallback to local config.
- Apply custom hardcoded interval of 50ms and timeout of 350ms to it.
Previously the fucntion used exp backoff with 5 steps up to ~340ms.
Switch to PollUntilContextTimeout() everywhere to allow
usage of the exposed timeouts in the kubeadm API. Exponential backoff
options are more difficult to expose in this regard and a bit too
detailed for the common user - i.e. have "steps", "factor" and so on.
Replace the usage of the deprecated wait.Poll() and
wait.PollImmediate() functions with wait.PollUntilContextTimeout().
Since we don't have piping of context around kubeadm,
use context.Background() everywhere.
Some wait.Poll() functions were converted to "immediate" as there
is no point for them to not be. This is done for consistency.
Replace the only instance of wait.JitterUntil with
wait.PollUntilContextTimeout. JitterUntil is not deprecated
but this is also done for consistency.
Currently, timeouts are only accessible if a kubeadm runtime.Object{}
like InitConfiguration is passed around.
Any time a config is loaded or defaulted, store the Timeouts
structure in a thread-safe way in the main kubeadm API package
with SetActiveTimeouts(). Optionally, a deep-copy can be
performed before calling SetActiveTimeouts(). Make this struct
accessible with GetActiveTimeouts(). Ensure these functions
are thread safe.
On init() make sure the struct is defaulted, so that unit
tests can work with these values.
When upconverting from v1beta3 to v1beta4, it appears there is no
easy way to migrate some of the timeout values such as:
ClusterConfiguration.APIServer.TimeoutForControlPlane
to a new location:
InitConfiguration.Timeouts.<some-timeout-field>
Yes, the internal InitConfiguratio does embed a ClusterConfiguration,
but during conversion the ClusterConfiguration is converted from an
empty source.
K8s' API machinery has ways to register custom conversion functions,
such as v1beta3.ClusterConfiguration -> internal.InitConfiguration,
but these must be triggered explicitly with a decoder.
The overall migration of fields seems very awkward.
There might be hacks around that, such as storing intermediate state,
while trying to make the fuzzer rountrip happy, but instead
mutation functions can be implemented for the internal types when
calling kubeadm's migrate code. This seems much cleaner.
The function TryRunCommand() uses an exponential backoff,
which is good, but it's inconsistent and only used in a couple
of places.
Remove its usage in the token.go#UpdateOrCreateTokens()
and switch to using the standard function used in other places -
PollUntilContextTimeout().
Remove wait.go#TryRunCommand(), as there are no other usages.
The function wait.go#WaitForKubeletAndFunc() has been used in
a number of places in kubeadm. It starts a go routine to wait for
the kubelet /healthz and in parallel starts another go routine
to wait for an custom function.
This logic is problematic. If kubeadm is waiting for the kubelet
in parallel with something that requires the kubelet, the right
solution would be to first wait for the kubelet in serial and only
then proceed with the other action. The parallelism here particularly
during "init" required a unwanted "initial timeout" of 40s, before
the kubelet waiting even starts. In most cases, this makes the kubelet
waiter to not even start, while the main point of waiting becomes
the "other action".
- Remove the function WaitForKubeletAndFunc() from the Waiter interface.
- Rename the function WaitForHealthyKubelet() to just WaitForKubelet()
to be consistent with the naming WaitForAPI().
- Update WaitForKubelet() to not use TryRunCommand() and instead
use PollUntilContextTimeout().
- Remove the "initial timeout" of 40s in WaitForKubelet().
- Make both WaitForKubelet() and WaitForAPI() use similar error
handling and output.
- Update all usage of WaitForKubelet() to be a serial call before
any other action, such as another wait* call.
- Make the default wait timeout for the kubelet
/healthz to be 1 minute (kubeadmconstants.DefaultKubeletTimeout).
- Apply updates to all implementations of the Waiter interface.