NVIDIA/kubevirt-gpu-device-plugin
Go
Captured source
source ↗NVIDIA/kubevirt-gpu-device-plugin
Description: NVIDIA k8s device plugin for Kubevirt
Language: Go
License: BSD-3-Clause
Stars: 284
Forks: 82
Open issues: 23
Created: 2019-08-12T21:35:43Z
Pushed: 2026-06-08T12:44:12Z
Default branch: master
Fork: no
Archived: no
README:
NVIDIA K8s Device Plugin to assign GPUs and vGPUs to KubeVirt VMs
> Starting from v1.1.0, we will only be supporting KubeVirt v0.36.0 or newer. Please use v1.0.1 for compatibility with older KubeVirt versions.
Table of Contents
- [About](#about)
- [Features](#features)
- [Prerequisites](#prerequisites)
- [Quick Start](#quick-start)
- [Docs](#docs)
About
This is a kubernetes device plugin that can discover and expose GPUs and vGPUs on a kubernetes node. This device plugin will enable to launch GPU attached KubeVirt VMs in your kubernetes cluster. Its specifically developed to serve KubeVirt workloads in a Kubernetes cluster.
Features
- Discovers Nvidia GPUs which are bound to VFIO-PCI driver and exposes them as devices available to be attached to VM in pass through mode.
- Discovers Nvidia vGPUs configured on a kubernetes node and exposes them to be attached to KubeVirt VMs
- Performs basic health check on the GPU on a kubernetes node.
Prerequisites
- Need to have Nvidia GPU configured for GPU passthrough or vGPU. Quickstart section provides details about this
- Kubernetes version >= v1.11
- KubeVirt release >= v0.36.0
- KubeVirt GPU feature gate should be enabled and permitted devices should be whitelisted. Feature gate is enabled by creating a ConfigMap. ConfigMap yaml can be found under
/examples.
Quick Start
Before starting the device plug, the GPUs on a kubernetes node need to configured to be in GPU pass through mode or vGPU mode
Whitelist GPU and vGPU in KubeVirt CR
GPUs and vGPUs should be allowlisted in the KubeVirt CR following the instructions outlined here. An example KubeVirt CR can be found under /examples.
Preparing a GPU to be used in pass through mode
GPU needs to be loaded with VFIO-PCI driver to be used in pass through mode
##### 1. Enable IOMMU and blacklist nouveau driver on KVM Host
Append "intel_iommu=on modprobe.blacklist=nouveau" to "GRUB_CMDLINE_LINUX"
$ vi /etc/default/grub # line 6: add (if AMD CPU, add [amd_iommu=on]) GRUB_TIMEOUT=5 GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)" GRUB_DEFAULT=saved GRUB_DISABLE_SUBMENU=true GRUB_TERMINAL_OUTPUT="console" GRUB_CMDLINE_LINUX="rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet intel_iommu=on modprobe.blacklist=nouveau" GRUB_DISABLE_RECOVERY="true"
###### Legacy Mode (BIOS)
grub2-mkconfig -o /boot/grub2/grub.cfg reboot
###### UEFI Mode
grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg reboot
After rebooting, verify IOMMU is enabled using following command
dmesg | grep -E "DMAR|IOMMU"
Verify that nouveau is disabled
dmesg | grep -i nouveau
##### 2. Enable vfio-pci kernel module
Determine vendor-ID and device-ID of the GPU using following command
lspci -nn | grep -i nvidia
In the example below the vendor-ID is 10de and device-ID is 1b38
$ lspci -nn | grep -i nvidia 04:00.0 3D controller [0302]: NVIDIA Corporation GP102GL [Tesla P40] [10de:1b38] (rev a1)
Update VFIO config
echo "options vfio-pci ids=vendor-ID:device-ID" > /etc/modprobe.d/vfio.conf
Considering vendor-ID is 10de and device-ID is 1b38 command will be as follows
echo "options vfio-pci ids=10de:1b38" > /etc/modprobe.d/vfio.conf
Update config to load VFIO-PCI module after reboot
echo 'vfio-pci' > /etc/modules-load.d/vfio-pci.conf reboot
Verify VFIO-PCI driver is loaded for the GPU
lspci -nnk -d 10de:
Output below shows that "Kernel driver in use" is "vfio-pci"
$ lspci -nnk -d 10de: 04:00.0 3D controller [0302]: NVIDIA Corporation GP102GL [Tesla P40] [10de:1b38] (rev a1) Subsystem: NVIDIA Corporation Device [10de:11d9] Kernel driver in use: vfio-pci Kernel modules: nouveau
--------------------------------------------------------------
Preparing a GPU to be used in vGPU mode
Nvidia Virtual GPU manager needs to be installed on the host to configure GPUs in vGPU mode.
##### 1. Change to the mdev_supported_types directory for the physical GPU.
$ cd /sys/class/mdev_bus/domain\:bus\:slot.function/mdev_supported_types/
This example changes to the mdev_supported_types directory for the GPU with the domain 0000 and PCI device BDF 06:00.0.
$ cd /sys/bus/pci/devices/0000\:06\:00.0/mdev_supported_types/
##### 2. Find out which subdirectory of mdev_supported_types contains registration information for the vGPU type that you want to create.
$ grep -l "vgpu-type" nvidia-*/name vgpu-type
The vGPU type, for example, M10-2Q. This example shows that the registration information for the M10-2Q vGPU type is contained in the nvidia-41 subdirectory of mdev_supported_types.
$ grep -l "M10-2Q" nvidia-*/name nvidia-41/name
##### 3. Confirm that you can create an instance of the vGPU type on the physical GPU.
$ cat subdirectory/available_instances
subdirectory -- The subdirectory that you found in the previous step, for example, nvidia-41.
The number of available instances must be at least 1. If the number is 0, either an instance of another vGPU type already exists on the physical GPU, or the maximum number of allowed instances has already been created.
This example shows that four more instances of the M10-2Q vGPU type can be created on the physical GPU.
$ cat nvidia-41/available_instances 4
##### 4. Generate a correctly formatted universally unique identifier (UUID) for the vGPU.
$ uuidgen aa618089-8b16-4d01-a136-25a0f3c73123
##### 5. Write the UUID that you obtained in the previous step to create the file in the registration information directory for the vGPU type that you want to create.
$ echo "uuid"> subdirectory/create
uuid -- The UUID that you generated in the previous step, which will become the UUID of the vGPU that you want to create.
subdirectory -- The registration information directory for the vGPU type that you want to create, for example, nvidia-41.
This example…
Excerpt shown — open the source for the full document.