NVIDIA/k8s-test-infra
Go
Captured source
source ↗NVIDIA/k8s-test-infra
Description: K8s-test-infra
Language: Go
License: Apache-2.0
Stars: 27
Forks: 16
Open issues: 28
Created: 2023-12-19T21:07:05Z
Pushed: 2026-06-10T11:29:18Z
Default branch: main
Fork: no
Archived: no
README:
k8s-test-infra
 
Kubernetes test infrastructure for NVIDIA GPU software — mock GPU environments, CI tooling, and testing utilities.
nvml-mock
Turn any Kubernetes cluster into a multi-GPU environment for testing. No physical NVIDIA hardware required.
# 1. Create cluster kind create cluster --name gpu-test # 2. Load the published image (or build locally with: docker build -t nvml-mock:local -f deployments/nvml-mock/Dockerfile .) docker pull ghcr.io/nvidia/nvml-mock:latest kind load docker-image ghcr.io/nvidia/nvml-mock:latest --name gpu-test # 3. Install helm install nvml-mock oci://ghcr.io/nvidia/k8s-test-infra/chart/nvml-mock
After install, deploy a consumer to test:
| Consumer | Guide | |----------|-------| | NVIDIA Device Plugin | [Quick Start](deployments/nvml-mock/helm/nvml-mock/README.md#quick-start-device-plugin-on-kind) | | NVIDIA DRA Driver | [Quick Start](deployments/nvml-mock/helm/nvml-mock/README.md#quick-start-dra-driver-on-kind) | | NVIDIA GPU Operator | [Quick Start](deployments/nvml-mock/helm/nvml-mock/README.md#quick-start-gpu-operator-on-kind) |
Full documentation: [nvml-mock Helm chart README](deployments/nvml-mock/helm/nvml-mock/README.md)
E2E Testing
The nvml-mock E2E workflow tests all GPU consumers across multiple profiles and node topologies. Run manually via workflow_dispatch or automatically on PRs.
| Test Suite | What It Validates | Profiles | |------------|-------------------|----------| | Device Plugin | nvidia.com/gpu allocatable resources | A100, H100, T4 | | DRA Driver | ResourceSlices via Dynamic Resource Allocation | A100, H100, T4 | | GPU Operator | Operator components: device plugin + GFD + validator (CDI injection) | A100, H100, T4 | | Multi-Node Fleet | Cross-node scheduling with heterogeneous GPUs | A100 + T4 |
Manual dispatch supports all 7 profiles: a100, h100, b200, gb200, gb300, l40s, t4.
See [.github/workflows/nvml-mock-e2e.yaml](.github/workflows/nvml-mock-e2e.yaml) for details.
Mock NVML Library
The underlying CGo-based mock libnvidia-ml.so that powers nvml-mock. Use standalone for local development and CI pipelines.
| Document | Description | |----------|-------------| | [Overview](docs/README.md) | Project overview, components, GPU profiles | | [Quick Start](docs/quickstart.md) | Build and run in 5 minutes | | [Configuration](docs/configuration.md) | YAML configuration reference | | [Architecture](docs/architecture.md) | System design and components | | [CUDA Mock](docs/cuda-mock.md) | Mock CUDA library overview | | [Development](docs/development.md) | Contributing and extending the library | | [Examples](docs/examples.md) | Usage patterns and scenarios | | [Troubleshooting](docs/troubleshooting.md) | Common issues and solutions |
Integrations
| Integration | Description | Guide | |-------------|-------------|-------| | fake-gpu-operator | Run:ai's K8s-level GPU simulation | [Integration Guide](docs/integrations/fake-gpu-operator.md) |
Demos
| Demo | Description | |------|-------------| | [Standalone](docs/demo/standalone/) | nvml-mock with FGO-style labels on Kind | | [With fake-gpu-operator](docs/demo/with-fgo/) | Full FGO + nvml-mock integration |
License
Apache License 2.0 — see [LICENSE](LICENSE).