Commit Graph

73 Commits

Author SHA1 Message Date
SataQiu
2dc0d2962a kubeadm: fix a bug where the RemoveMember function did not return the correct member list when the member to be removed did not exist 2024-09-26 14:29:30 +08:00
SataQiu
9af1b25bec kubeadm: check the member list status before adding or removing an etcd member 2024-09-24 22:53:42 +08:00
SataQiu
18318a32ce kubeadm: check whether the peer URL for the added etcd member already exists when the MemberAddAsLearner/MemberAdd fails 2024-09-20 11:52:47 +08:00
Lubomir I. Ivanov
99313bea88 kubeadm: remove constants for mirror pod timeout
StaticPodMirroringTimeout and StaticPodMirroringRetryInterval
are use for just an API call to get Pods(). The already existing
constants.KubernetesAPICallRetryInterval
and kubeadmapi.GetActiveTimeouts().KubernetesAPICall.Duration
can be used for that instead.
2024-03-01 13:04:08 +02:00
Kubernetes Prow Robot
821c0ef61e Merge pull request #123489 from yxxhero/print_etcd_ready_status
feat: print etcd ready status
2024-02-25 07:42:03 -08:00
yxxhero
e6d6d8e14c feat: print etcd ready status
Signed-off-by: yxxhero <aiopsclub@163.com>
2024-02-25 20:56:47 +08:00
Lubomir I. Ivanov
5f876b9d0a kubeadm: switch from ExponentialBackoff() to PollUntilContextTimeout()
Switch to PollUntilContextTimeout() everywhere to allow
usage of the exposed timeouts in the kubeadm API. Exponential backoff
options are more difficult to expose in this regard and a bit too
detailed for the common user - i.e. have "steps", "factor" and so on.
2024-01-14 15:07:56 +02:00
xin.li
430fd83454 kubeadm: increase ut coverage for util/etcd
Signed-off-by: xin.li <xin.li@daocloud.io>
2023-12-25 11:14:05 +08:00
Lubomir I. Ivanov
c2a04fa1cf kubeadm: fix export comments to make golangci-lint happy 2023-10-25 19:35:10 +03:00
xin.li
20db4ef3d6 kubeadm: fix wrong ut for util/etcd
Signed-off-by: xin.li <xin.li@daocloud.io>
2023-10-07 21:57:20 +08:00
Paco Xu
3a2c4d6f09 kubeadm: fix nil pointer when etcd member is already removed 2023-08-04 11:37:00 +08:00
Daniel Lipovetsky
ef9f8d7c0c kubeadm: Remove leading zeros from etcd member ID in log messages 2023-05-12 17:38:44 -07:00
Daniel Lipovetsky
ff4c6916ec kubeadm: Fix log message when etcd member is added as learner 2023-05-12 17:38:44 -07:00
Daniel Lipovetsky
5fd5768ef3 kubeadm: Make etcd member removal idempotent
If the etcd member is not found, then it has already been removed, and
kubeadm reset should immediately complete the 'remove-etcd-member'
phase. Previously, the phase would complete only once the
exponential-backoff retry expired, up to 3 minutes duration.

This commit also fixes a semantic error in etcd.GetMemberID. Previously,
the function returned 0 if no member was found, but 0 is not a valid
member ID.
2023-05-10 09:13:31 -07:00
Daniel Lipovetsky
05b3449346 kubeadm: Add etcd client unit tests 2023-05-08 13:35:03 -07:00
Daniel Lipovetsky
fc1b228779 kubeadm: Use internal etcd client through an interface 2023-05-08 13:35:03 -07:00
Tobias Giese
ea46c91868 kubeadm: promote member after the static pod manifest was written
Signed-off-by: Tobias Giese <tobias.giese@mercedes-benz.com>
Co-authored-by: Christian Schlotter <christi.schlotter@gmail.com>
2023-01-16 11:11:58 +01:00
Paco Xu
37f5da904b kubeadm: remove nested loops for member promotion 2022-12-17 12:40:15 +08:00
Paco Xu
b3deecfb17 add etcd as learner mode and promote when fg EtcdLearnerMode is enabled
- use etcd backoff to wait; still has many warning messages
Signed-off-by: Paco Xu <paco.xu@daocloud.io>
2022-12-16 21:09:59 +08:00
XinYang
72fd01095d re-order imports for kubeadm
Signed-off-by: XinYang <xinydev@gmail.com>
2021-08-17 22:40:46 +08:00
Ian Gann
c8431f42d9 kubeadm: Reduce the backoff time of AddMember for etcd
This change optimizes the kubeadm/etcd `AddMember` client-side function
by stopping early in the backoff loop when a peer conflict is found
(indicating the member has already been added to the etcd cluster). In
this situation, the function will stop early and relay a call to
`ListMembers` to fetch the current list of members to return. With this
optimization, front-loading a `ListMembers` call is no longer necessary,
as this functionally returns the equivalent response.

This helps reduce the amount of time taken in situational cases where an
initial client request to add a member is accepted by the server, but
fails client-side.

This situation is possible situationally, such as if network latency
causes the request to timeout after it was sent and accepted by the
cluster. In this situation, the following loop would occur and fail with
an `ErrPeerURLExist` response, and would be stuck until the backoff
timeout was met (roughly ~2min30sec currently).

Testing Done:

* Manual testing with an etcd cluster. Initial "AddMember` call was
  successful, and the etcd manifest file was identical to prior version
  of these files. Subsequent calls to add the same member succeeded
  immediately (retaining idempotency), and the resulting manifest file
  remains identical to previous version as well. The difference, this
  time, is the call finished ~2min25sec faster in an identical test in
  the environment tested with.
2021-08-05 13:11:42 -07:00
XinYang
c2a8cd359f re-order the imports in kubeadm
Signed-off-by: XinYang <xinydev@gmail.com>

Update cmd/kubeadm/app/cmd/join.go

Co-authored-by: Lubomir I. Ivanov <neolit123@gmail.com>
2021-07-04 16:41:27 +08:00
Jordan Liggitt
2979c3325e Switch to go.etcd.io/etcd/client/v3 2021-06-15 09:53:06 -04:00
Lubomir I. Ivanov
8b9d0dceb1 kubeadm: remove the ClusterStatus object from v1beta3
- Remove the object form v1beta3 and internal type
- Deprecate a couple of phases that were specifically designed / named to
modify the ClusterStatus object
- Adapt logic around annotation vs ClusterStatus retrieval
- Update unit tests
- Run generators
2021-05-17 19:27:36 +03:00
Benjamin Elder
56e092e382 hack/update-bazel.sh 2021-02-28 15:17:29 -08:00
Lubomir I. Ivanov
ebf163684a kubeadm: adjust the logic around etcd data directory creation
- Ensure the directory is created with 0700 via a new function
called CreateDataDirectory().
- Call this function in the init phases instead of the manual call
to MkdirAll.
- Call this function when joining control-plane nodes with local etcd.

If the directory creation is left to the kubelet via the
static Pod hostPath mounts, it will end up with 0755
which is not desired.
2020-09-03 18:38:54 +03:00
SataQiu
800dd19fc2 increase robustness for kubeadm etcd operations
Signed-off-by: SataQiu <1527062125@qq.com>
2020-06-15 22:43:21 +08:00
Kubernetes Prow Robot
02637bb250 Merge pull request #91145 from tnqn/kubeadm-reset-error
kubeadm: skip removing last etcd member in reset phase
2020-05-27 15:04:01 -07:00
Quan Tian
9cc416e7df kubeadm: do not remove the only remaining etcd member during reset
If this is the only remaining stacked etcd member in the cluster,
calling RemoveMember() is not needed.
2020-05-21 02:12:36 +08:00
Davanum Srinivas
07d88617e5 Run hack/update-vendor.sh
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:33 -04:00
Davanum Srinivas
442a69c3bd switch over k/k to use klog v2
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:27 -04:00
Lubomir I. Ivanov
1c430ff30f kubeadm: fix flakes when performing etcd MemberAdd on slower setups
In slower setups it can take more time for the existing cluster
to be in a healthy state, so the existing backoff of ~50 seconds
is apparently not sufficient.

The client dial can also fail for similar reasons.

Improve kubeadm's join toleration of adding new etcd members.
Wrap both the client dial and member add in a longer backoff
(up to ~200 seconds).

This particular change should be backported to the support skew.
In a future change for master, all etcd client operations should be
make consistent so that the etcd logic is in a sane state.
2020-04-30 18:53:29 +03:00
SataQiu
004a61a46c kubeadm: fix some mistakes about log output 2020-04-15 14:32:46 +08:00
Rostislav M. Georgiev
c8b7e5739c kubeadm: Use image tag as version of stacked etcd
kubeadm uses image tags (such as `v3.4.3-0`) to specify the version of
etcd. However, the upgrade code in kubeadm uses the etcd client API to
fetch the currently deployed version. The result contains only the etcd
version without the additional information (such as image revision) that
is normally found in the tag. As a result it would refuse an upgrade
where the etcd versions match and the only difference is the image
revision number (`v3.4.3-0` to `v3.4.3-1`).

To fix the above issue, the following changes are done:
- Replace the existing etcd version querying code, that uses the etcd
  client library, with code that returns the etcd image tag from the
  local static pod manifest file.
- If an etcd `imageTag` is specified in the ClusterConfiguration during
  upgrade, use that tag instead. This is done regardless if the tag was
  specified in the configuration stored in the cluster or with a new
  configuration supplied by the `--config` command line parameter.
  If no custom tag is specified, kubeadm will select one depending on
  the desired Kubernetes version.
- `kubeadm upgrade plan` no longer prints upgrade information about
  external etcd. It's the user's responsibility to manage it in that
  case.

Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
2020-03-30 16:28:45 +03:00
Rafael Fernández López
3e59a0651f kubeadm: optimize the upgrade path from ClusterStatus to annotations
When doing the very first upgrade from a cluster that contains the
source of truth in the ClusterStatus struct, the new kubeadm logic
will try to retrieve this information from annotations.

This changeset adds to both etcd and apiserver endpoint retrieval the
special case in which they won't retry if we are in such cases. The
logic will retry if we find any unknown error, but will not retry in
the following cases:

- etcd annotations do not contain etcd endpoints, but the overall list
  of etcd pods is greater than 0. This means that we listed at least
  one etcd pod, but they are missing the annotation.

- API server annotation is not found on the api server pod for a given
  node name, but no errors aside from that one were found. This means
  that the API server pod is present, but is missing the annotation.

In both cases there is no point in retrying, and so, this speeds up the
upgrade path when coming from a previous existing cluster.
2020-02-20 12:19:05 +01:00
Rafael Fernández López
b140c5d64b kubeadm: remove ClusterStatus dependency
While `ClusterStatus` will be maintained and uploaded, it won't be
used by the internal `kubeadm` logic in order to determine the etcd
endpoints anymore.

The only exception is during the first upgrade cycle (`kubeadm upgrade
apply`, `kubeadm upgrade node`), in which we will fallback to the
ClusterStatus to let the upgrade path add the required annotations to
the newly created static pods.
2020-02-20 12:18:56 +01:00
Lubomir I. Ivanov
a027c379f7 kubeadm: increase timeouts in the etcd client
- Extend the exponential backoff for add/remove/... retry to
11 steps ~=106 seconds. From experiments for 3 and more members
the race can take more that ~=26 seconds.
- Increase the dialTimeout for client creation to 40 seconds.
20 seconds seems racy for 3 and more members.
2020-01-25 00:48:05 +02:00
Lubomir I. Ivanov
5e0c0779a1 kubeadm: handle multiple members without names during concurrent join
For the etcd client, amend AddMember() to handle a very
rare bug when multiple members can end up with the same
name. Match the member peer address and assign it the name of
the member we are adding. For the rest of the members with missing
names use their member IDs as name. The etcd node is not disrupted
by the unknown names.

The important aspects are:
- The number of members of the initial cluster must match
the members in the cluster.
- The member we are current adding is present in the initial cluster.
2020-01-25 00:48:05 +02:00
SataQiu
72559ec693 kubeadm upgrades always persist the etcd backup for stacked 2020-01-06 12:34:28 +08:00
fabriziopandini
0573a2227f add retry to etcd operations 2019-11-14 09:27:03 +01:00
Wenjia Zhang
660b17d0ae Pin dependencies and update vendors 2019-10-24 14:09:24 -07:00
Wenjia Zhang
9ead9373f3 Resolve uncompatibility from update: etcd CAFile -> TrustedCAFIle 2019-10-24 14:09:24 -07:00
Wenjia Zhang
3b274fad2a Replace github.com/coreos/etcd by go.etcd.io/etcd 2019-10-24 14:09:24 -07:00
Kubernetes Prow Robot
7e060eec79 Merge pull request #81908 from tedyu/etcd-cluster-avail
Remove Client#ClusterAvailable from interface
2019-09-10 17:42:46 -07:00
Gyuho Lee
93b9545f48 vendor: update with "update-vendor.sh" script
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-29 08:46:02 -07:00
Gyuho Lee
eb1509a1d3 kubeadm/app/util/etcd: : block etcd client creation until connection is up
The new etcd balancer (>3.3.14, 3.4.0) uses an asynchronous resolver for
endpoints. Without "WithBlock", the client may return before the
connection is up.

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
2019-08-29 08:38:29 -07:00
Ted Yu
2167321adb Remove Client#ClusterAvailable from interface 2019-08-29 07:40:34 -07:00
Lubomir I. Ivanov
25668531f7 kubeadm: run MemberAdd/Remove for etcd clients with exp-backoff retry
When adding a new etcd member the etcd cluster can enter a state
of vote, where any new members added at the exact same time will
fail with an error right away.

Implement exponential backoff retry around the MemberAdd call.

This solves a kubeadm problem when concurrently joining
control-plane nodes with stacked etcd members.

From experiment, a few retries with milliseconds apart are
sufficient to achieve the concurrent join of a 3xCP cluster.

Apply the same backoff to MemberRemove in case the concurrent
removal of members fails for similar reasons.
2019-07-03 03:26:30 +03:00
Lubomir I. Ivanov
6f6b364b9c kubeadm: update output of init, join reset commands
- move most unrelated to phases output to klog.V(1)
- rename some prefixes for consistency - e.g.
[kubelet] -> [kubelet-start]
- control-plane-prepare: print details for each generated CP
component manifest.
- uppercase the info text for all "[reset].." lines
- modify the text for one line in reset
2019-03-06 03:17:35 +02:00
RA489
a0ee4b471d Refactor etcd client function have same signatures in etcd.go 2019-02-25 12:54:12 +05:30