Commit Graph

122 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
3dcad5f0db Merge pull request #128532 from neolit123/1.32-handle-custom-addreses-comp-readyz
kubeadm: use advertise address for WaitForAllControlPlaneComponents
2024-11-06 08:51:29 +00:00
Lubomir I. Ivanov
0cfcaa82e1 kubeadm: use advertise address for WaitForAllControlPlaneComponents 2024-11-05 09:00:38 +02:00
Kubernetes Prow Robot
6fce566781 Merge pull request #128474 from neolit123/1.32-handle-custom-addreses-comp-readyz
kubeadm: use actual addresses/ports for WaitForAllControlPlaneComponents
2024-11-02 17:19:26 +00:00
Lubomir I. Ivanov
b2741f7b1c kubeadm: use actual addresses/ports for WaitForAllControlPlaneComponents
By default check the KCM and scheduler on 127.0.0.1:<port> as that is the
defaall --bind-address kubeamd uses for these components.

For kube-apiserver take the value from APIEndpoint.AdvertiseAddress which is
dynamically detected from the host. Unless the user has passed explicitly --advertise-address
as an extra arg.

Read the <port> values for all components from the --secure-port flag
value if needed. Otherwise use defaults.

Use /livez for apiserver and scheduler. Add TODO for KCM to
switch to /livez as well.
2024-11-02 18:09:36 +02:00
Lubomir I. Ivanov
07918a59e8 kubeadm: support dryrunning upgrade wihout a real cluster
Make the following changes:
- When dryrunning if the given kubeconfig does not exist
create a DryRun object without a real client. This means only
a fake client will be used for all actions.
- Skip the preflight check if manifests exist during dryrun.
Print "would ..." instead.
- Add new reactors that handle objects during upgrade.
- Add unit tests for new reactors.
- Print message on "upgrade node" that this is not a CP node
if the apiserver manifest is missing.
- Add a new function GetNodeName() that uses 3 different methods
for fetching the node name. Solves a long standing issue where
we only used the cert in kubelet.conf for determining node name.
- Various other minor fixes.
2024-10-31 14:58:47 +02:00
SataQiu
dc48aed791 kubeadm: support joining control plane nodes in dryrun mode without a real initialized control plane 2024-10-28 21:37:58 +08:00
Lubomir I. Ivanov
30f9893374 kubeadm: refactor the dry-run logic
The current dryrun client implemnetation is suboptimal
and sparse. It has the following problems:

- When an object CREATE or UPDATE reaches the default dryrun client
the operation is a NO-OP, which means subsequent GET calls must
fully emulate the object that exists in the store.
- There are multiple implmentations of a DryRunGetter interface
such the one in init_dryrun.go but there are no implementations
for reset, upgrade, join.
- There is a specific DryRunGetter that is backed by a real
client in clientbacked_dryrun.go, but this is used for upgrade
and does not work in conjuction with a fake client.

This commit does the following changes:

- Removes all existing *dryrun*.go implementations.
- Add a new DryRun implementation in dryrun.go that implements
3 clients - fake clientset, real clientset, real dynamic client.
- The DryRun object uses the method chaining pattern.
- Allows the user opt-in into real clients only if needed, by passing
a real kubeconfig. By default only constructs a fake client.
- The default reactor chain for the fake client, always logs the
object action, then for GET or LIST actions attempts to use the
real dynamic client to get the object. If a real object does not
exist it attempts to get the object from the fake object store.
- The user can prepend or append reactors to the chain.
- All known needed reactors for operations during init, join,
reset, upgrade are added as methods of the DryRun struct.
- Adds detailed unit test for the DryRun struct and its methods
including reactors.

Additional changes:
- Use the new DryRun implementation in all command workflows -
init, join, reset, upgrade.
- Ensure that --dry-run works even if there is no active cluster
by returning faked objects. For join, a faked cluster-info
with a fake bootstrap token and CA are used.
2024-10-11 00:15:59 +03:00
Lubomir I. Ivanov
b90b280c5a kubeadm: fix join bug where kubeletconfig was not patched in memory
During kubeadm join in 1.30 kubeadm started respecting
the kubeletconfiguration healthz address/port. Previously
it hardcoded the health check to localhost:defaultport.

A corner case was not handled where the user applies --patches
on join to modify the local kubeletconfiguration. This results
in kubeletconfiguration patch target patches not being applied to
the KubeletConfiguration in memory and the health check
running on the address:port which are present in the kubelet-config
configmap.

Fix that by explicitly calling a new function to patch the
KubeletConfiguration in memory. This is scoped to only handle
the healthz checks *after* the kubelet config.yaml was already
patched and written to disk.
2024-07-20 19:31:19 +03:00
Lubomir I. Ivanov
52302e4ad5 kubeadm: use the actual configured kubelet healthz address:port
When doing a kubelet health check on init/join, do not
hardcode the "localhost" address. Instead, use the
KubeletConfiguration HealthzBindAddress and HealthzPort
fields.
2024-06-01 10:10:31 +03:00
Kubernetes Prow Robot
03f24068da Merge pull request #123341 from neolit123/1.30-health-check-all-cp-components
kubeadm: introduce the WaitForAllControlPlaneComponents feature gate
2024-02-29 05:05:42 -08:00
Lubomir I. Ivanov
c29450eb00 kubeadm: apply retries to all API calls in idempotency.go
The idempotency.go (perhaps not so accurately named) contains
API calls that kubeadm does against an API server using client-go.

Some users seem to have unstable setups where for unknown reasons
the API server can be unavailable or refuse to respond as expected.

Use PollUntilContextTimeout in all exported functions to ensure
such API calls are all retry-able.

NOTE: The context passed to PollUntilContextTimeout is not propagated
in the polled function. Instead the poll function creates it's own
context 'ctx := context.Background()', this is to avoid
breaking expectations on the side of the callers, that expect
a certain type of error and not "context timeout" errors.

Additional changes:
- Make all context.TODO() -> context.Background()
- Update all unit tests and make sure during testing the retry
interval and timeout are short. Test coverage of idempotency.go
is at ~97%.
- Remove the TestMutateConfigMapWithConflict test. It does not
contribute much, because conflict handling is done at the API,
server side, not on the side of kubeadm. This simulating this is not
needed.
2024-02-18 13:14:32 +02:00
Lubomir I. Ivanov
7db7222592 kubeadm: introduce the WaitForAllControlPlaneComponents feature gate
WaitForAllControlPlaneComponents is a new feature gate
that can be used to tell kubeadm to wait for all control plane
components and not only kube-apiserver.

- Add the Waiter function WaitForControlPlaneComponents
that waits for all CP components in parallel. Uses the regular
healthz endpoint for checks of status 200.
- Add a new experimental phase to kubeadm join called "wait-control-plane".
A similar phase exists for kubeadm init.
2024-02-16 17:33:38 +02:00
xin.li
deec79ad8d kubeadm: increase ut coverage for apiclient/idempotency
Signed-off-by: xin.li <xin.li@daocloud.io>
2024-02-05 23:02:48 +08:00
Lubomir I. Ivanov
2cdd9a7130 kubeadm: use separate context in GetConfigMapWithShortRetry
Intentionally pass a new context to this API call.
This will let the API call run independently of the parent
context timeout, which is quite short and can cause the API
call to return abruptly.
2024-01-19 00:19:07 +02:00
Lubomir I. Ivanov
26a79e4c0b kubeadm: special case context errors in GetConfigMapWithShortRetry
If some code is about to go over the context deadline,
"x/time/rate/rate.go" would return and untyped error with the string
"would exceed context deadline". If some code already exceeded
the deadline the error would be of type DeadlineExceeded.
Ignore such context errors and only store API and connectivity errors.
2024-01-18 15:35:25 +02:00
Lubomir I. Ivanov
54a6e6a772 kubeadm: keep a function with short timeout in idempotency.go
- Name the function GetConfigMapWithShortRetry to be
easier to understand that the function is with a very short timeout.
Add note that this function should be used in cases there is a
fallback to local config.
- Apply custom hardcoded interval of 50ms and timeout of 350ms to it.
Previously the fucntion used exp backoff with 5 steps up to ~340ms.
2024-01-16 17:53:21 +02:00
Lubomir I. Ivanov
caf5311413 kubeadm: start using the Timeouts struct values
Propagate usage of the Timeout struct values.
Apply sanitazation to timeout constants in contants.go.
2024-01-14 15:07:56 +02:00
Lubomir I. Ivanov
374e41cf66 kubeadm: replace deprecated wait.Poll() and wait.PollImmediate()
Replace the usage of the deprecated wait.Poll() and
wait.PollImmediate() functions with wait.PollUntilContextTimeout().
Since we don't have piping of context around kubeadm,
use context.Background() everywhere.

Some wait.Poll() functions were converted to "immediate" as there
is no point for them to not be. This is done for consistency.

Replace the only instance of wait.JitterUntil with
wait.PollUntilContextTimeout. JitterUntil is not deprecated
but this is also done for consistency.
2024-01-14 15:07:55 +02:00
Lubomir I. Ivanov
32fbb23f3b kubeadm: remove usage of the TryRunCommand() function
The function TryRunCommand() uses an exponential backoff,
which is good, but it's inconsistent and only used in a couple
of places.

Remove its usage in the token.go#UpdateOrCreateTokens()
and switch to using the standard function used in other places -
PollUntilContextTimeout().

Remove wait.go#TryRunCommand(), as there are no other usages.
2023-12-20 08:51:00 +02:00
Lubomir I. Ivanov
557118897d kubeadm: drop concurrency when waiting for kubelet /healthz
The function wait.go#WaitForKubeletAndFunc() has been used in
a number of places in kubeadm. It starts a go routine to wait for
the kubelet /healthz and in parallel starts another go routine
to wait for an custom function.

This logic is problematic. If kubeadm is waiting for the kubelet
in parallel with something that requires the kubelet, the right
solution would be to first wait for the kubelet in serial and only
then proceed with the other action. The parallelism here particularly
during "init" required a unwanted "initial timeout" of 40s, before
the kubelet waiting even starts. In most cases, this makes the kubelet
waiter to not even start, while the main point of waiting becomes
the "other action".

- Remove the function WaitForKubeletAndFunc() from the Waiter interface.
- Rename the function WaitForHealthyKubelet() to just WaitForKubelet()
to be consistent with the naming WaitForAPI().
- Update WaitForKubelet() to not use TryRunCommand() and instead
use PollUntilContextTimeout().
- Remove the "initial timeout" of 40s in WaitForKubelet().
- Make both WaitForKubelet() and WaitForAPI() use similar error
handling and output.
- Update all usage of WaitForKubelet() to be a serial call before
any other action, such as another wait* call.
- Make the default wait timeout for the kubelet
/healthz to be 1 minute (kubeadmconstants.DefaultKubeletTimeout).
- Apply updates to all implementations of the Waiter interface.
2023-12-20 08:51:00 +02:00
Kubernetes Prow Robot
589d6f3886 Merge pull request #117630 from skitt/intstr-fromint32-cluster-lifecycle
Cluster lifecycle: use new intstr functions
2023-05-19 08:50:30 -07:00
Stephen Kitt
4c83aae2cc kubeadm: replace intstr.FromInt with intstr.FromInt32
This touches cases where FromInt() is used on numeric constants, or
values which are already int32s, or int variables which are defined
close by and can be changed to int32s with little impact.

Signed-off-by: Stephen Kitt <skitt@redhat.com>
2023-05-01 09:17:50 +02:00
cui fliter
1359ebcc5b fix doc mismatch
Signed-off-by: cui fliter <imcusg@gmail.com>
2023-04-16 18:29:45 +08:00
Lubomir I. Ivanov
54b73deaca kubeadm: handle dry run GET actions from fake discovery
The kubeadm dry run client reactor code is flawed as it assumes
all invoked "get" verb actions can be casted to GetAction.
Apparently that is not the case when Discovery().ServerVersion()
and other discovery calls are made. In such cases the action
type is the bare ActionImpl.

Catch if an action can be casted to ActionImpl and construct a
GetAction from it. GetActionImpl only suppersets ActionImpl with
a Name field (empty string in this case).

Add unit test for Discovery().ServerVersion().
2022-12-21 11:49:59 +02:00
QuantumEnergyE
847a39afc0 Retry patch when then service is unavailable or timeout. 2022-11-29 23:09:31 +08:00
SataQiu
d4cafe4738 kubeadm: optimize and make the usage consistent about apierrors.IsNotFound 2022-10-13 23:23:53 +08:00
SataQiu
61cd585ad2 kubeadm: remove redundant import alias and unused apiclient util funcs 2022-09-28 12:36:54 +08:00
B Aravind
b307321c0a I have evaluated TODO retry remove if feasible (#112383)
Hi team, hope u all doing well.

I have checked TODO that to remove "retry" if feasible but it's important i think that it shouldn't be removed because it was used in every file on your repo.

Update idempotency.go

Update idempotency.go

Update idempotency.go
2022-09-13 22:31:00 -07:00
cndoit18
ec43037d0f style: remove redundant judgment
Signed-off-by: cndoit18 <cndoit18@outlook.com>
2022-08-25 12:07:36 +08:00
XuzhengChang
7824316e89 Print getStaticPodSingleHash err message 2022-03-02 09:34:12 +08:00
SataQiu
2c5aef9036 kubeadm: fix the bug that 'kubeadm init --dry-run --upload-certs' command failed with 'secret not found' error 2022-02-09 12:58:02 +08:00
Monokaix
eab74f15a5 Remove unused arg of kubeadm/WaitForKubeletAndFunc 2021-12-25 09:12:00 +08:00
haoyun
a600e31c55 test: add test for PatchNode when error happend
Signed-off-by: haoyun <yun.hao@daocloud.io>
2021-10-19 11:01:01 +08:00
haoyun
bd8f26c2d7 fix: patchNode retry logic
Signed-off-by: haoyun <yun.hao@daocloud.io>
2021-10-17 12:36:36 +08:00
Antonio Ojea
0cd75e8fec run hack/update-netparse-cve.sh 2021-08-20 10:42:09 +02:00
XinYang
72fd01095d re-order imports for kubeadm
Signed-off-by: XinYang <xinydev@gmail.com>
2021-08-17 22:40:46 +08:00
XinYang
c2a8cd359f re-order the imports in kubeadm
Signed-off-by: XinYang <xinydev@gmail.com>

Update cmd/kubeadm/app/cmd/join.go

Co-authored-by: Lubomir I. Ivanov <neolit123@gmail.com>
2021-07-04 16:41:27 +08:00
Sandeep Rajan
b8a1bd6a6c remove the deprecated kube-dns as an option in kubeadm 2021-03-04 12:12:54 -05:00
Benjamin Elder
56e092e382 hack/update-bazel.sh 2021-02-28 15:17:29 -08:00
Xianglin Gao
04ef3628e3 refact CreateOrMutateConfigMap and MutateConfigMap with PollImmediate
Signed-off-by: Xianglin Gao <xianglin.gxl@alibaba-inc.com>
2020-06-11 00:31:22 +08:00
Xianglin Gao
6d572ea9b7 Add retries for CreateOrUpdateRoleBinding
Signed-off-by: Xianglin Gao <xianglin.gxl@alibaba-inc.com>
2020-06-10 00:23:46 +08:00
Xianglin Gao
052eb7d9a5 Add retries for CreateOrUpdateRole
Signed-off-by: Xianglin Gao <xianglin.gxl@alibaba-inc.com>
2020-06-10 00:12:25 +08:00
Jordan Liggitt
b7c2faf26c client-go dynamic client: add context to callers 2020-03-06 10:56:23 -05:00
Mike Danese
76f8594378 more artisanal fixes
Most of these could have been refactored automatically but it wouldn't
have been uglier. The unsophisticated tooling left lots of unnecessary
struct -> pointer -> struct transitions.
2020-03-05 14:59:47 -08:00
Taesun Lee
97fc3e6139 Fix typos in apiclient util
fix initalTimeout to initialTimeout
2020-02-20 15:20:04 +09:00
Mike Danese
25651408ae generated: run refactor 2020-02-08 12:30:21 -05:00
Mike Danese
3aa59f7f30 generated: run refactor 2020-02-07 18:16:47 -08:00
Mike Danese
d55d6175f8 refactor 2020-01-29 08:50:45 -08:00
Rafael Fernández López
14fe7225c1 kubeadm: Improve resiliency in CreateOrMutateConfigMap
CreateOrMutateConfigMap was not resilient when it was trying to Create
the ConfigMap. If this operation returned an unknown error the whole
operation would fail, because it was strict in what error it was
expecting right afterwards: if the error returned by the Create call
was a IsAlreadyExists error, it would work fine. However, if an
unexpected error (such as an EOF) happened, this call would fail.

We are seeing this error specially when running control plane node
joins in an automated fashion, where things happen at a relatively
high speed pace.

It was specially easy to reproduce with kind, with several control
plane instances. E.g.:

```
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
I1130 11:43:42.788952     887 round_trippers.go:443] POST https://172.17.0.2:6443/api/v1/namespaces/kube-system/configmaps?timeout=10s  in 1013 milliseconds
Post https://172.17.0.2:6443/api/v1/namespaces/kube-system/configmaps?timeout=10s: unexpected EOF
unable to create ConfigMap
k8s.io/kubernetes/cmd/kubeadm/app/util/apiclient.CreateOrMutateConfigMap
	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/util/apiclient/idempotency.go:65
```

This change makes this logic more resilient to unknown errors. It will
retry on the light of unknown errors until some of the expected error
happens: either `IsAlreadyExists`, in which case we will mutate the
ConfigMap, or no error, in which case the ConfigMap has been created.
2019-11-30 22:48:16 +01:00
Jordan Liggitt
752cda4fc4 guard kubeadm dependencies on k8s.io/kubernetes 2019-11-13 15:05:11 -05:00