By default check the KCM and scheduler on 127.0.0.1:<port> as that is the
defaall --bind-address kubeamd uses for these components.
For kube-apiserver take the value from APIEndpoint.AdvertiseAddress which is
dynamically detected from the host. Unless the user has passed explicitly --advertise-address
as an extra arg.
Read the <port> values for all components from the --secure-port flag
value if needed. Otherwise use defaults.
Use /livez for apiserver and scheduler. Add TODO for KCM to
switch to /livez as well.
- Split the code that tries to get node name from SSR into
a new function getNodeNameFromSSR(). Unit test the function.
- Fix error that the "system:nodes:" prefix was not trimmed.
- Fix mislearding errors around FetchInitConfigurationFromCluster.
This function performs multiple actions, and the "get node"
action can also be of type apierrors.NotFound(). This creates
confusion in the returned error in enforceRequirement during
upgrade. Fix this problem.
Make the following changes:
- When dryrunning if the given kubeconfig does not exist
create a DryRun object without a real client. This means only
a fake client will be used for all actions.
- Skip the preflight check if manifests exist during dryrun.
Print "would ..." instead.
- Add new reactors that handle objects during upgrade.
- Add unit tests for new reactors.
- Print message on "upgrade node" that this is not a CP node
if the apiserver manifest is missing.
- Add a new function GetNodeName() that uses 3 different methods
for fetching the node name. Solves a long standing issue where
we only used the cert in kubelet.conf for determining node name.
- Various other minor fixes.
The current dryrun client implemnetation is suboptimal
and sparse. It has the following problems:
- When an object CREATE or UPDATE reaches the default dryrun client
the operation is a NO-OP, which means subsequent GET calls must
fully emulate the object that exists in the store.
- There are multiple implmentations of a DryRunGetter interface
such the one in init_dryrun.go but there are no implementations
for reset, upgrade, join.
- There is a specific DryRunGetter that is backed by a real
client in clientbacked_dryrun.go, but this is used for upgrade
and does not work in conjuction with a fake client.
This commit does the following changes:
- Removes all existing *dryrun*.go implementations.
- Add a new DryRun implementation in dryrun.go that implements
3 clients - fake clientset, real clientset, real dynamic client.
- The DryRun object uses the method chaining pattern.
- Allows the user opt-in into real clients only if needed, by passing
a real kubeconfig. By default only constructs a fake client.
- The default reactor chain for the fake client, always logs the
object action, then for GET or LIST actions attempts to use the
real dynamic client to get the object. If a real object does not
exist it attempts to get the object from the fake object store.
- The user can prepend or append reactors to the chain.
- All known needed reactors for operations during init, join,
reset, upgrade are added as methods of the DryRun struct.
- Adds detailed unit test for the DryRun struct and its methods
including reactors.
Additional changes:
- Use the new DryRun implementation in all command workflows -
init, join, reset, upgrade.
- Ensure that --dry-run works even if there is no active cluster
by returning faked objects. For join, a faked cluster-info
with a fake bootstrap token and CA are used.
With https://github.com/kubernetes/kubernetes/pull/122079,
kubeadm now relies on `ttlSecondsAfterFinished` to clean
up `upgrade-health-check` once its pod reaches a terminal state.
However, there is a case where the pod won't reach a terminal state and
the job will not register a terminal state, hence no garbage collection.
For example, if the pause image is not present, `ErrImagePull` will make
the pod keep retrying to pull the image and the pod will never reach a
terminal state on its own. And the job will continue to wait for the pod
to reach a terminal state which will not happen.
So we need to set `activeDeadlineSeconds` to prevent the job from
waiting forever for the pod to reach a terminal state.
Without this, users invoking `kubeadm upgrade plan` need to cleanup the
job outside of kubeadm even if they ignore the preflight result because
the job still runs when the result is configured to be ignored via
`--ignore-prelight-errors=CreateJob` flag.
Since the timeout for the polling in the `CreateJob` step in kubeadm
is 15 seconds, we should set the `activeDeadlineSeconds` to the same
timeout.
After the flags to control manifest paths were removed,
the variables in diff.go remained empty and the validation
fails with "empty manifest path".
Always populate the paths with the kubeadm defaults under
/etc/kubernetes/manifest and remove the empty path validation.
If an unknown command or a phase is called consistently
return the same error.
If a command that has subcommands is called
return an error.
To achieve the above add a new util function
RequireSubcommand() that sets NoArgs and RunE for
regular commands or a "phase" command.
Remove MacroCommandLongDescription and just return an
error that a subcommand is required from the phase runner.
Fix minor comments capitalization.
Perform other minor fixes in util/error.go.
The flags for control-plane component manifest under 'upgrade diff'
and the --feture-gates flag were deprecated and NOOP in 1.31
and can be removed in 1.32.