What does this release signal mean?

NVIDIA Release: NVIDIA/k8s-test-infra v0.2.0

published Jun 12, 2026seen Jun 13captured Jun 13http 200method plain

Repository: NVIDIA/k8s-test-infra

Tag: v0.2.0

Published: 2026-06-12T08:05:10Z

Prerelease: no

Release notes:

Mock InfiniBand subsystem: real ibstat, ibstatus, iblinkinfo, ibv_devinfo, and cross-node ibping now work inside the nvml-mock DaemonSet without InfiniBand hardware. LD_PRELOAD shims (libibmocksys.so, libibmockumad.so, libibmockverbs.so) redirect sysfs/umad/verbs access to a fake tree rendered per profile, and the in-pod mock-ib daemon relays MAD traffic between pods over the Kubernetes network. The daemon and Service are only created for profiles with InfiniBand enabled. (#367)
New GPU profile `gb300`: NVIDIA GB300 NVL (Grace-Blackwell Ultra) — 8 GPUs/node, 288 GiB HBM3e per GPU, 1.4 kW TDP, PCIe Gen6, NVLink v5, driver line 570.124.06, with an NVL72-shaped PCIe topology out of the box.
PCIe sysfs topology: profiles carry a pcie_topology: block and the new render-pci-sysfs binary materializes a fake /sys/bus/pci/devices tree, so topology-aware consumers (NVIDIA DRA driver dra.k8s.io/pcieRoot, device-plugin NUMA hints) resolve PCIe root complexes. (#263)
NVML library-size padding: libnvidia-ml.so is padded to land within ~10% of the real driver library size, so size-sanity tooling accepts the mock; configurable or fully disableable. (#247)
Canonical sysfs `bus_id` form in profiles (0000:07:00.0), aligning the mock with real Linux PCI sysfs. (#263)
Test suite standardized on `testify/require`; soft t.Errorf checks upgraded to hard failures. (#386)

docs/demo/standalone/demo.sh runs on macOS stock bash 3.2 (no more mapfile). (#385)
Helm chart OCI publishing signs correctly again: cosign now authenticates to GHCR and signs the chart by digest. (#388)

docker pull ghcr.io/nvidia/nvml-mock:0.2.0

helm install nvml-mock oci://ghcr.io/nvidia/k8s-test-infra/chart/nvml-mock --version 0.2.0

Image and chart are cosign-signed (keyless). Full details in CHANGELOG.md.

Full diff: https://github.com/NVIDIA/k8s-test-infra/compare/v0.1.0...v0.2.0

The following v0.2.0 features were missing from the original notes (the changelog has been amended to match):

Dynamic per-query metric sampling: utilization, temperature, power, and clocks vary plausibly across calls. (#323)
GPU failure injection: ecc_uncorrectable, lost, and fallen_off_bus modes with Xid 79 propagation. (#328)
ComputeDomain / NVLink fabric simulation: nvmlDeviceGetGpuFabricInfo(+V) driven by a cluster topology ConfigMap, with fake nvidia-imex / nvidia-imex-ctl peer-readiness coordination. (#337, #342)
Toolkit-ready marker file for GPU Operator validator compatibility. (#346)

notability 2.0/10

Routine release of a Kubernetes test infrastructure tool.