RepoNVIDIANVIDIApublished Jul 17, 2024seen 15h

NVIDIA/nvrc

Rust

Open original ↗

Captured source

source ↗
published Jul 17, 2024seen 15hcaptured 15hhttp 200method plain

NVIDIA/nvrc

Description: The NVRC project provides a Rust binary that implements a simple init system for microVMs.

Language: Rust

License: Apache-2.0

Stars: 35

Forks: 16

Open issues: 20

Created: 2024-07-17T15:58:39Z

Pushed: 2026-06-10T18:13:47Z

Default branch: main

Fork: no

Archived: no

README:

NVRC - NVIDIA Runtime Container Init

![OpenSSF Scorecard](https://scorecard.dev/viewer/?uri=github.com/NVIDIA/nvrc)

A minimal init system (PID 1) for ephemeral NVIDIA GPU-enabled VMs running under Kata Containers. NVRC sets up GPU drivers, configures hardware, spawns NVIDIA management daemons, and hands off to kata-agent for container orchestration.

Design Philosophy

Fail Fast, Fail Hard: NVRC is designed for ephemeral confidential VMs where any configuration failure should immediately terminate the VM. There are no recovery mechanisms—if GPU initialization fails, the VM powers off. This "panic-on-failure" approach ensures:

  • Security: No undefined states in confidential computing environments
  • Simplicity: No complex error recovery logic to audit
  • Clarity: If it's running, it's configured correctly

Architecture

flowchart TD
Start([NVRC starts as PID 1]) --> PanicHook[Set panic hook
power off VM on panic]
PanicHook --> MountFS[Mount filesystems
/proc /dev /sys /run /tmp]
MountFS --> LoopbackUp[Bring up loopback interface]
LoopbackUp --> InitKernlog[Initialize kernel logging]
InitKernlog --> PollSyslogOnce[Poll syslog once]
PollSyslogOnce --> ParseKernel[Parse kernel parameters
/proc/cmdline]

ParseKernel --> DetectMode[Detect mode]
DetectMode --> ModeSelect{Mode?}

ModeSelect -->|gpu default| GPUMode[GPU Mode]
ModeSelect -->|cpu| CPUMode[CPU Mode]
ModeSelect -->|servicevm-nvl4| NVL4Mode[ServiceVM NVL4
H100/H200/H800]
ModeSelect -->|servicevm-nvl5| NVL5Mode[ServiceVM NVL5
B100/B200/B300]

GPUMode --> GPUSteps[• Load nvidia.ko nvidia-uvm
• Start nvidia-persistenced
• nvidia-smi: lmc lgc pl srs
• nv-hostengine dcgm-exporter
• Generate CDI spec
• Health checks]

CPUMode --> CPUSteps[• Skip GPU initialization]

NVL4Mode --> NVL4Steps[• Load nvidia.ko
• Start fabric-mgr greedy
• Health checks]

NVL5Mode --> NVL5Steps[• Load ib_umad mlx5_ib
• Detect CX7 port GUID
• Start nvlsm
• Start fabric-mgr symmetric
• Health checks]

GPUSteps --> Lockdown
CPUSteps --> Lockdown
NVL4Steps --> Lockdown
NVL5Steps --> Lockdown

Lockdown[Disable kernel module loading
security lockdown]
Lockdown --> ForkAgent[Fork kata-agent
handoff control to guest agent]
ForkAgent --> PollSyslog[Poll syslog forever
keep PID 1 alive]

style Start fill:#e1f5ff
style PollSyslog fill:#e1f5ff
style GPUMode fill:#c8e6c9
style CPUMode fill:#fff9c4
style NVL4Mode fill:#ffccbc
style NVL5Mode fill:#ffccbc

Kernel Parameters

NVRC is configured entirely via kernel command-line parameters (no config files). This is critical for minimal init environments where userspace configuration doesn't exist yet.

Core Parameters

| Parameter | Values | Default | Description | | ----------- | ------------------------------------------------ | ------- | ----------------------------------------------------------------------------------------------------------------------------------- | | nvrc.mode | gpu, cpu, nvswitch-nvl4, nvswitch-nvl5 | gpu | Operation mode. cpu for CPU-only, nvswitch-nvl4 for H100/H200/H800 service VMs, nvswitch-nvl5 for B200/B300/B100 service VMs. | | nvrc.log | off, error, warn, info, debug, trace | off | Log verbosity level. Also enables /proc/sys/kernel/printk_devkmsg. |

GPU Configuration

| Parameter | Values | Default | Description | | -------------- | ---------------------- | ------- | -------------------------------------------------------------------------------------------------- | | nvrc.smi.lgc | ` | - | Lock GPU core clocks to fixed frequency. Eliminates thermal throttling for consistent performance. | | nvrc.smi.lmc | | - | Lock memory clocks to fixed frequency. Used alongside lgc for fully deterministic GPU behavior. | | nvrc.smi.pl | | - | Set GPU power limit. Lower values reduce heat/power; higher allows peak performance. | | nvrc.smi.srs | enabled, disabled` | - | Secure Randomization Seed for GPU memory (passed to nvidia-smi). |

Daemon Control

| Parameter | Values | Default | Description | | --------------------------- | --------------------------------------- | -------- | -------------------------------------------------------------------------------------------------- | | nvrc.uvm.persistence.mode | on/off, true/false, 1/0, yes/no | true | UVM persistence mode keeps unified memory state across CUDA context teardowns. | | nvrc.dcgm | on/off, true/false, 1/0, yes/no | false | Enable DCGM (Data Center GPU Manager) for telemetry and health monitoring. | | nvrc.fm.mode | 0, 1 | - | Fabric Manager mode: 0=bare metal, 1=servicevm (shared nvswitch). Auto-set in nvswitch modes. | | nvrc.fm.rail.policy | greedy, symmetric | greedy | Partition rail policy. Symmetric required for Confidential Computing on Blackwell. |

Example Configurations

Minimal GPU setup (defaults):

nvrc.mode=gpu

CPU-only mode:

nvrc.mode=cpu

NVSwitch NVL4 mode (Service VM for HGX H100/H200/H800 - NVLink 4.0):

nvrc.mode=nvswitch-nvl4

NVSwitch NVL5 mode (Service VM for HGX B200/B300/B100 - NVLink 5.0):

nvrc.mode=nvswitch-nvl5

GPU with locked clocks for benchmarking:

nvrc.mode=gpu nvrc.smi.lgc=1500 nvrc.smi.lmc=5001 nvrc.smi.pl=300

GPU with DCGM monitoring:

nvrc.mode=gpu nvrc.dcgm=on nvrc.log=info

Multi-GPU with NVLink:

nvrc.mode=gpu nvrc.fm.mode=0 nvrc.log=debug

Build

NVRC is compiled as a statically-linked musl binary for minimal dependencies:

# x86_64
cargo build --release --target x86_64-unknown-linux-musl

# aarch64
cargo build --release --target aarch64-unknown-linux-musl

Build configuration in .cargo/config.toml enables aggressive size optimization and static linking.

Testing

# Unit tests (requires root for some tests)
cargo test

# Coverage (requires llvm-cov and root)
cargo llvm-cov --all-features --workspace

# Fuzzing
cargo +nightly fuzz run kernel_params

# Static analysis
cargo clippy --all-features -- -D warnings
cargo audit
cargo deny check...

Excerpt shown — open the source for the full document.