Files
homelab/docs/nvidia.md
2023-05-02 17:54:42 -03:00

9.3 KiB

How to setup a Nvidia GPU on k3s/Kubernetes

An importantion observation before starting:

I'm only installing the drivers at the host level because I want to use that node's GPU to gaming as well, which it means the drivers have to be installed at the host level and should not conflict with the nodes workloads.

However, that might not be your case, then you can install the Nvidia GPU Operator instead of using this tutorial.


It consists of the following steps:

  1. Verify if your GPU is compatible with CUDA
  2. Installing the Nvidia proprietary driver on the node (At the host level)
  3. Install CUDA drivers
  4. Patching the driver so the GPU has unrestricted simultaneous NVENC video encoding sessions (If you are using a consumer-grade GPU)
  5. Installing Nvidia's container toolkit on the node (At host level, only compatible with containerd and Docker)
  6. Configuring k3s containerd, so the Nvidia container runtime may be able to interact with the low-level interface
  7. Deploying gpu-feature-discovery along node-feature-discovery to your cluster
  8. Create a runtime class resource on the cluster
  9. Deploying k8s-device-plugin chart as well
  10. Testing the setup with a pod that utilizes the new Nvidia runtime and runs a CUDA benchmark.

Verifying GPU compatibility with CUDA

Installing Nvidia's proprietary Driver

  • I recommend checking what's the latest driver available for your GPU and downloading it through this table's links.
  • Follow these steps to install the file you have downloaded from the table.
  • Reboot your computer/server.

Installing CUDA Drivers

  • This step changes a little from each distribution to another, so it is better to refer to the official documentation for your distribution.
  • Reboot your computer afterward.
  • You should be able to see your GPU and CUDA information with nvidia-smi by this step.

Patching your GPU Driver

  • It is very straightforward, you have to clone the repository from the steps above locally:
git clone https://github.com/keylase/nvidia-patch.git
cd nvidia-patch
  • Then running the patch script file:
chmod +x patch.sh
bash ./patch.sh

Installing Nvidia's container runtime/toolkit

sudo ctr run --rm -t \
    --runc-binary=/usr/bin/nvidia-container-runtime \
    --env NVIDIA_VISIBLE_DEVICES=all \
    nvcr.io/nvidia/cuda:11.8.0-base-ubuntu22.04 \
    cuda-11.8.0-base-ubuntu22.04 nvidia-smi

Adjusting containerd to use Nvidia's container runtime

  • If you are using k3s, then your containerd file of reference is located at /var/lib/rancher/k3s/agent/etc/containerd/config.toml

  • If you are using standard Kubernetes with containerd, then your config file is probably located at /etc/containerd/config.toml

  • Open the file mentioned above in a text editor, and add the following values to it:

version = 2
[plugins]
  [plugins."io.containerd.grpc.v1.cri"]
    [plugins."io.containerd.grpc.v1.cri".containerd]
      default_runtime_name = "nvidia"
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"
          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
            BinaryName = "/usr/bin/nvidia-container-runtime"
  • So in the end it should look something like this:
version = 2

[plugins."io.containerd.internal.v1.opt"]
  path = "/var/lib/rancher/k3s/agent/containerd"

[plugins."io.containerd.grpc.v1.cri"]
  stream_server_address = "127.0.0.1"
  stream_server_port = "10010"
  enable_selinux = false
  enable_unprivileged_ports = true
  enable_unprivileged_icmp = true
  sandbox_image = "rancher/mirrored-pause:3.6"

[plugins."io.containerd.grpc.v1.cri".containerd]
  default_runtime_name = "nvidia"
  snapshotter = "overlayfs"
  disable_snapshot_annotations = true

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
  runtime_type = "io.containerd.runc.v2"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
  SystemdCgroup = true

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
 privileged_without_host_devices = false
 runtime_engine = ""
 runtime_root = ""
 runtime_type = "io.containerd.runc.v2"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
 BinaryName = "/usr/bin/nvidia-container-runtime"

Deploying gpu-feature-discovery along node-feature-discovery

Both Helm charts are required to properly label nodes that have a GPU attached to them. Although k8s-device-plugin allows you to deploy gpu-feature-discovery as part of their resources, I decided to split them. You can find my Helm chart values.yaml equivalent file for ArgoCD here:

It is recommended to read all the individual respositories, their default values.yaml for the Helm deployments, and what each value should be changed to match your infrastructure.

If you need an example, feel free to use mine as a reference to change yours.

Create a runtime class resource

Apply the following yaml to your cluster:

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: nvidia
handler: nvidia

Deploying k8s-device-plugin

There a few caveats to this deployment in specific.

Common issues

  • "It won't start properly, the container logs read: "Error: failed to create FS watcher: too many open files."

The solution is to increase the value of limit imposed on the user of inotify.max_user_instances. To check your system values, you can type the following:

cat /proc/sys/fs/inotify/max_user_instances
cat /proc/sys/fs/inotify/max_user_watches

And to increase the value on systems that have a /etc/sysctl.d folder:

echo 'fs.inotify.max_user_watches = 524288' | sudo tee -a /etc/sysctl.d/99-nvidia.conf
echo 'fs.inotify.max_user_instances = 8192' | sudo tee -a /etc/sysctl.d/99-nvidia.conf
sudo sysctl -p /etc/sysctl.d/99-nvidia.conf

Or in systems that don't have it:

echo 'fs.inotify.max_user_watches = 524288' | sudo tee -a /etc/sysctl.conf
echo 'fs.inotify.max_user_instances = 8192' | sudo tee -a /etc/sysctl.conf
sudo sysctl -p /etc/sysctl.conf

The values in those two lines should be higher than those found initially on each of the inotify file on your system.

Testing your cluster's GPU deployment

apiVersion: v1
kind: Pod
metadata:
  name: gpu-operator-test
spec:
  runtimeClassName: nvidia
  restartPolicy: OnFailure
  containers:
  - name: cuda-vector-add
    image: "nvidia/samples:vectoradd-cuda10.2"
    resources:
      limits:
         nvidia.com/gpu: 1

The output should be:

[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done