NVIDIA/gpu-admin-tools
Python
Captured source
source ↗NVIDIA/gpu-admin-tools
Description: GPU Admin Tools. Includes Confidential Computing controls for H100, and other functionality
Language: Python
Stars: 80
Forks: 29
Open issues: 15
Created: 2023-12-19T20:55:53Z
Pushed: 2026-06-06T00:35:37Z
Default branch: main
Fork: no
Archived: no
README:
NVIDIA GPU Admin Tools
This utility is used for various configuration including the Confidential Computing modes of supported GPUs as well as some debug/test tasks. It is designed to be run as a privileged python3 command.
Supported CC modes are:
- on
- All supported GPU security features are enabled (e.g., bus encryption, performance counters off)
- devtools
- All supported GPU security features are enabled, however blocks preventing DevTools profiling/debugging are lifted
- off
- The GPU operates in its default mode; no supplementary confidential computing features are enabled
Most Commonly Used Examples
##### Query the CC mode of all GPUs the system sudo python3 ./nvidia_gpu_tools.py --devices gpus --query-cc-mode ##### Query the CC mode of first 4 GPUs the system sudo python3 ./nvidia_gpu_tools.py --devices gpus[0:4] --query-cc-mode ##### Enable CC mode on all GPUs sudo python3 ./nvidia_gpu_tools.py --devices gpus --set-cc-mode=on --reset-after-cc-mode-switch ##### Disable CC mode on a specific GPU in the system sudo python3 ./nvidia_gpu_tools.py --devices 45:00.0 --set-cc-mode=off --reset-after-cc-mode-switch
##### Generic debug dump from GPU sudo python3 ./nvidia_gpu_tools.py --gpu-bdf=45:00.0 --debug-dump --log debug ##### Debug dump of NVLINK state sudo python3 ./nvidia_gpu_tools.py --gpu-bdf=45:00.0 --nvlink-debug-dump --log debug
Usage
sudo python3 nvidia_gpu_tools.py --help
NVIDIA GPU Tools version v2025.03.26o
Command line arguments: ['nvidia_gpu_tools.py', '--help']
usage: nvidia_gpu_tools.py [-h] [--devices DEVICES] [--gpu GPU]
[--gpu-bdf GPU_BDF] [--gpu-name GPU_NAME]
[--no-gpu]
[--log {debug,info,warning,error,critical}]
[--mmio-access-type {devmem,sysfs}]
[--recover-broken-gpu]
[--set-next-sbr-to-fundamental-reset]
[--reset-with-sbr] [--reset-with-flr]
[--reset-with-os] [--remove-from-os]
[--sysfs-bind SYSFS_BIND] [--sysfs-unbind]
[--query-ecc-state] [--query-cc-mode]
[--query-cc-settings] [--query-ppcie-mode]
[--query-ppcie-settings] [--query-prc-knobs]
[--set-cc-mode {off,on,devtools}]
[--reset-after-cc-mode-switch]
[--test-cc-mode-switch]
[--reset-after-ppcie-mode-switch]
[--set-ppcie-mode {off,on}]
[--test-ppcie-mode-switch]
[--set-bar0-firewall-mode {off,on}]
[--query-bar0-firewall-mode]
[--query-l4-serial-number] [--query-module-name]
[--clear-memory] [--debug-dump]
[--nvlink-debug-dump]
[--knobs-reset-to-defaults-list]
[--knobs-reset-to-defaults KNOBS_RESET_TO_DEFAULTS [KNOBS_RESET_TO_DEFAULTS ...]]
[--knobs-reset-to-defaults-assume-no-pending-changes]
[--knobs-reset-to-defaults-test] [--noop]
[--force-ecc-on-after-reset] [--test-ecc-toggle]
[--query-mig-mode] [--force-mig-off-after-reset]
[--test-mig-toggle]
[--block-nvlink BLOCK_NVLINK [BLOCK_NVLINK ...]]
[--block-all-nvlinks] [--test-nvlink-blocking]
[--dma-test] [--test-pcie-p2p]
[--read-sysmem-pa READ_SYSMEM_PA]
[--write-sysmem-pa WRITE_SYSMEM_PA WRITE_SYSMEM_PA]
[--read-config-space READ_CONFIG_SPACE]
[--write-config-space WRITE_CONFIG_SPACE WRITE_CONFIG_SPACE]
[--read-bar0 READ_BAR0]
[--write-bar0 WRITE_BAR0 WRITE_BAR0]
[--read-bar1 READ_BAR1]
[--write-bar1 WRITE_BAR1 WRITE_BAR1]
[--ignore-nvidia-driver]
{} ...
positional arguments:
{}
options:
-h, --help show this help message and exit
--devices DEVICES Generic device selector supporting multiple comma-separated specifiers:
- 'gpus' - Find all NVIDIA GPUs
- 'gpus[n]' - Find nth NVIDIA GPU
- 'gpus[n:m]' - Find NVIDIA GPUs from index n to m
- 'nvswitches' - Find all NVIDIA NVSwitches
- 'nvswitches[n]' - Find nth NVIDIA NVSwitch
- 'vendor:device' - Find devices matching 4-digit hex vendor:device ID
- 'domain:bus:device.function' - Find device at specific BDF address
--gpu GPU
--gpu-bdf GPU_BDF Select a single GPU by providing a substring of the
BDF, e.g. '01:00'.
--gpu-name GPU_NAME Select a single GPU by providing a substring of the
GPU name, e.g. 'T4'. If multiple GPUs match, the first
one will be used.
--no-gpu Do not use any of the GPUs; commands requiring one
will not work.
--log {debug,info,warning,error,critical}
--mmio-access-type {devmem,sysfs}
On Linux, specify whether to do MMIO through /dev/mem
or /sys/bus/pci/devices/.../resourceN
--recover-broken-gpu Attempt recovering a broken GPU (unresponsive config
space or MMIO) by performing an SBR. If the GPU is
broken from the beginning and hence correct config
space wasn't saved then reenumarate it in the OS by
sysfs remove/rescan to restore BARs etc.
--set-next-sbr-to-fundamental-reset
Configure the GPU to make the next SBR same as
fundamental reset. After the SBR this setting resets
back to False. Supported on H100 only.
--reset-with-sbr Reset the GPU with SBR and restore its config space
settings, before any other actions
--reset-with-flr Reset the GPU with FLR and restore its config space
settings, before any other actions
--reset-with-os Reset with OS through /sys/.../reset
--remove-from-os Remove from OS through /sys/.../remove
--sysfs-bind SYSFS_BIND
Bind devices to the specified driver
--sysfs-unbind Unbind devices from the current driver
--query-ecc-state Query the ECC state of the GPU
--query-cc-mode Query the current Confidential Computing (CC) mode of
the GPU.
--query-cc-settings Query the Confidential Computing (CC) settings of the
GPU.This prints the lower level setting knobs that
will take effect upon GPU reset.
--query-ppcie-mode Query the current Protected PCIe (PPCIe) mode of the
GPU or switch.
--query-ppcie-settings
Query the Protected PPCIe (PPCIe) settings of the GPU
or switch.This prints the lower level setting knobs
that will take effect upon GPU or switch reset.
--query-prc-knobs Query all the Product Reconfiguration (PRC) knobs.
--set-cc-mode {off,on,devtools}
Configure Confidentail Computing (CC) mode. The
choices are off (disabled), on (enabled) or devtools
(enabled in DevTools mode).The GPU needs to be reset
to make the selected mode active. See --reset-after-
cc-mode-switch for one way of doing it.
--reset-after-cc-mode-switch
Reset the GPU after switching CC mode such that it is
activated immediately.
--test-cc-mode-switch
Test switching CC modes.
--reset-after-ppcie-mode-switch
Reset the GPU or switch after switching…Excerpt shown — open the source for the full document.