Make the following changes:
- When dryrunning if the given kubeconfig does not exist
create a DryRun object without a real client. This means only
a fake client will be used for all actions.
- Skip the preflight check if manifests exist during dryrun.
Print "would ..." instead.
- Add new reactors that handle objects during upgrade.
- Add unit tests for new reactors.
- Print message on "upgrade node" that this is not a CP node
if the apiserver manifest is missing.
- Add a new function GetNodeName() that uses 3 different methods
for fetching the node name. Solves a long standing issue where
we only used the cert in kubelet.conf for determining node name.
- Various other minor fixes.
The current dryrun client implemnetation is suboptimal
and sparse. It has the following problems:
- When an object CREATE or UPDATE reaches the default dryrun client
the operation is a NO-OP, which means subsequent GET calls must
fully emulate the object that exists in the store.
- There are multiple implmentations of a DryRunGetter interface
such the one in init_dryrun.go but there are no implementations
for reset, upgrade, join.
- There is a specific DryRunGetter that is backed by a real
client in clientbacked_dryrun.go, but this is used for upgrade
and does not work in conjuction with a fake client.
This commit does the following changes:
- Removes all existing *dryrun*.go implementations.
- Add a new DryRun implementation in dryrun.go that implements
3 clients - fake clientset, real clientset, real dynamic client.
- The DryRun object uses the method chaining pattern.
- Allows the user opt-in into real clients only if needed, by passing
a real kubeconfig. By default only constructs a fake client.
- The default reactor chain for the fake client, always logs the
object action, then for GET or LIST actions attempts to use the
real dynamic client to get the object. If a real object does not
exist it attempts to get the object from the fake object store.
- The user can prepend or append reactors to the chain.
- All known needed reactors for operations during init, join,
reset, upgrade are added as methods of the DryRun struct.
- Adds detailed unit test for the DryRun struct and its methods
including reactors.
Additional changes:
- Use the new DryRun implementation in all command workflows -
init, join, reset, upgrade.
- Ensure that --dry-run works even if there is no active cluster
by returning faked objects. For join, a faked cluster-info
with a fake bootstrap token and CA are used.
With https://github.com/kubernetes/kubernetes/pull/122079,
kubeadm now relies on `ttlSecondsAfterFinished` to clean
up `upgrade-health-check` once its pod reaches a terminal state.
However, there is a case where the pod won't reach a terminal state and
the job will not register a terminal state, hence no garbage collection.
For example, if the pause image is not present, `ErrImagePull` will make
the pod keep retrying to pull the image and the pod will never reach a
terminal state on its own. And the job will continue to wait for the pod
to reach a terminal state which will not happen.
So we need to set `activeDeadlineSeconds` to prevent the job from
waiting forever for the pod to reach a terminal state.
Without this, users invoking `kubeadm upgrade plan` need to cleanup the
job outside of kubeadm even if they ignore the preflight result because
the job still runs when the result is configured to be ignored via
`--ignore-prelight-errors=CreateJob` flag.
Since the timeout for the polling in the `CreateJob` step in kubeadm
is 15 seconds, we should set the `activeDeadlineSeconds` to the same
timeout.
During kubeadm join in 1.30 kubeadm started respecting
the kubeletconfiguration healthz address/port. Previously
it hardcoded the health check to localhost:defaultport.
A corner case was not handled where the user applies --patches
on join to modify the local kubeletconfiguration. This results
in kubeletconfiguration patch target patches not being applied to
the KubeletConfiguration in memory and the health check
running on the address:port which are present in the kubelet-config
configmap.
Fix that by explicitly calling a new function to patch the
KubeletConfiguration in memory. This is scoped to only handle
the healthz checks *after* the kubelet config.yaml was already
patched and written to disk.
When the PublicKeysECDSA feature gate is used or the new
v1beta4.ClusterConfiguration.EncryptionAlgorithm field is used
with "ECDSA-P256" as value, make sure that this is reflected
in the "cert spec" used to generate private keys and they end
up as "EC keys".
When doing a kubelet health check on init/join, do not
hardcode the "localhost" address. Instead, use the
KubeletConfiguration HealthzBindAddress and HealthzPort
fields.
Currently if etcd.yaml does not have a diff on "kubeadm upgrade"
certificate renewal for it is also skipped.
Check if kube-apiserver.yaml needs an upgrade, if so and if
cert renewal is not disabled, renew etcd's certs and restart
its static pod.
Currently, there are some unit tests that are failing on
Windows due to various reasons:
- Cannot remove a directory if there's a file open in that directory.
- Paths may have / or \ on Windows.
Allow the user to pass custom cert validity period with
ClusterConfiguration.CertificateValidityPeriod and
CACertificateValidityPeriod.
The defaults remain 1 year for regular cert and 10 years for CA.
Show warnings if the provided values are more than the defaults.
Additional changes:
- In "certs show-expiration" use HumanDuration() to print
more detailed durations instead of ShortHumanDuration().
- Add a new kubeadm util GetStartTime() which can be used
to consistently get a UTC time for tasks like writing certs
and unit tests.
- Update unit tests to validate the new customizable NotAfter.