Enhanced instance existence checks to handle inaccessible Proxmox nodes. Improved test cases for instance existence and metadata retrieval. Signed-off-by: Serge Logvinov <serge.logvinov@sinextra.dev>
7.0 KiB
Install
Proxmox Cloud Controller Manager (CCM) supports controllers:
- cloud-node
- cloud-node-lifecycle
cloud-node - detects new node launched in the cluster and registers them in the cluster.
Assigns labels and taints based on Proxmox VM configuration.
cloud-node-lifecycle - detects node deletion on Proxmox side and removes them from the cluster.
Requirements
You need to set --cloud-provider=external in the kubelet argument for all nodes in the cluster.
The flag informs the kubelet to offload cloud-specific responsibilities to this external component like Proxmox CCM.
kubelet --cloud-provider=external
Otherwise, kubelet will attempt to manage the node's lifecycle by itself, which can cause issues in environments using an external Cloud Controller Manager (CCM).
Optional
# ${IP} can be single or comma-separated list of two IPs (dual stack).
kubelet --node-ip=${IP}
If your node has multiple IP addresses, you may need to set the --node-ip flag in the kubelet arguments to specify which IP address the kubelet should use.
This ensures that the correct IP address is used for communication between the node and other components in the Kubernetes cluster, especially in environments where multiple network interfaces or IP addresses are present.
# ${ID} has format proxmox://$REGION/$VMID.
kubelet --provider-id=${ID}
If CCM cannot define VMID, you may need to set the --provider-id flag in the kubelet arguments to specify the VM ID in Proxmox. This ensures that the CCM can manage the node by VM ID.
# ${NODENAME} is the name of the node.
kubelet --hostname-override=${NODENAME}
If your node has a different hostname than the one registered in the cluster, you may need to set the --hostname-override flag in the kubelet arguments to specify the correct hostname.
Create a Proxmox token
Official documentation
# Create role CCM
pveum role add CCM -privs "VM.Audit Sys.Audit"
# Create user and grant permissions
pveum user add kubernetes@pve
pveum aclmod / -user kubernetes@pve -role CCM
pveum user token add kubernetes@pve ccm -privsep 0
Deploy CCM
Create the proxmox credentials config file:
clusters:
# List of Proxmox clusters, region mast be unique
- url: https://cluster-api-1.exmple.com:8006/api2/json
insecure: false
token_id: "kubernetes@pve!ccm"
# Token from the previous step
token_secret: "secret"
# Region name, can be any string, it will use as for kubernetes topology.kubernetes.io/region label
region: cluster-1
See configuration documentation for more details.
Method 1: kubectl
Upload it to the kubernetes:
kubectl -n kube-system create secret generic proxmox-cloud-controller-manager --from-file=config.yaml
Deploy Proxmox CCM with cloud-node,cloud-node-lifecycle controllers
kubectl apply -f https://raw.githubusercontent.com/sergelogvinov/proxmox-cloud-controller-manager/main/docs/deploy/cloud-controller-manager.yml
Deploy Proxmox CCM with cloud-node-lifecycle controller (for Talos)
kubectl apply -f https://raw.githubusercontent.com/sergelogvinov/proxmox-cloud-controller-manager/main/docs/deploy/cloud-controller-manager-talos.yml
Method 2: helm chart
Create the config file
# proxmox-ccm.yaml
config:
clusters:
- url: https://cluster-api-1.exmple.com:8006/api2/json
insecure: false
token_id: "kubernetes@pve!ccm"
token_secret: "secret"
region: cluster-1
Deploy Proxmox CCM (deployment mode)
helm upgrade -i --namespace=kube-system -f proxmox-ccm.yaml \
proxmox-cloud-controller-manager \
oci://ghcr.io/sergelogvinov/charts/proxmox-cloud-controller-manager
Deploy Proxmox CCM (daemonset mode)
It makes sense to deploy on all control-plane nodes. Do not forget to set the nodeSelector.
helm upgrade -i --namespace=kube-system -f proxmox-ccm.yaml \
--set useDaemonSet=true \
proxmox-cloud-controller-manager \
oci://ghcr.io/sergelogvinov/charts/proxmox-cloud-controller-manager
More options you can find here
Deploy CCM (Rancher)
Official documentation
Rancher RKE2 configuration:
machineGlobalConfig:
# Kubelet predefined value --cloud-provider=external
cloud-provider-name: external
# Disable Rancher CCM
disable-cloud-controller: true
Create the helm values file:
# proxmox-ccm.yaml
config:
clusters:
- url: https://cluster-api-1.exmple.com:8006/api2/json
insecure: false
token_id: "kubernetes@pve!ccm"
token_secret: "secret"
region: cluster-1
# Use host resolv.conf to resolve proxmox connection url
useDaemonSet: true
# Set nodeSelector in daemonset mode is required
nodeSelector:
node-role.kubernetes.io/control-plane: ""
Deploy Proxmox CCM (daemondset mode)
helm upgrade -i --namespace=kube-system -f proxmox-ccm.yaml \
proxmox-cloud-controller-manager \
oci://ghcr.io/sergelogvinov/charts/proxmox-cloud-controller-manager
Deploy CCM with load balancer (optional)
This optional setup to improve the Proxmox API availability.
See load balancer for installation instructions.
Troubleshooting
How kubelet works with flag cloud-provider=external:
- kubelet join the cluster and send the
Nodeobject to the API server. Node object has values:node.cloudprovider.kubernetes.io/uninitializedtaint.alpha.kubernetes.io/provided-node-ipannotation with the node IP.nodeInfofield with system information.
- CCM detects the new node and sends a request to the Proxmox API to get the VM configuration. Like VMID, hostname, etc.
- CCM updates the
Nodeobject with labels, taints andproviderIDfield. TheproviderIDis immutable and has the formatproxmox://$REGION/$VMID, it cannot be changed after the first update. - CCM removes the
node.cloudprovider.kubernetes.io/uninitializedtaint.
If kubelet does not have cloud-provider=external flag, kubelet will expect that no external CCM is running and will try to manage the node lifecycle by itself.
This can cause issues with Proxmox CCM.
So, CCM will skip the node and will not update the Node object.
If you modify the kubelet flags, it's recommended to check all workloads in the cluster.
Please delete the node resource first, and restart the kubelet.
The steps to troubleshoot the Proxmox CCM:
- scale down the CCM deployment to 1 replica.
- set log level to
--v=5in the deployment. - check the logs
- check kubelet flag
--cloud-provider=external, delete the node resource and restart the kubelet. - check the logs
- wait for 1 minute. If CCM cannot reach the Proxmox API, it will log the error.
- check tains, labels, and providerID in the
Nodeobject.