RepoNVIDIANVIDIApublished Dec 19, 2023seen 2d

NVIDIA/k8s-test-infra

Go

Open original ↗

Captured source

source ↗
published Dec 19, 2023seen 2dcaptured 10hhttp 200method plain

NVIDIA/k8s-test-infra

Description: K8s-test-infra

Language: Go

License: Apache-2.0

Stars: 27

Forks: 16

Open issues: 28

Created: 2023-12-19T21:07:05Z

Pushed: 2026-06-10T11:29:18Z

Default branch: main

Fork: no

Archived: no

README:

k8s-test-infra

![CI](https://github.com/NVIDIA/k8s-test-infra/actions/workflows/ci.yaml) ![OpenSSF Scorecard](https://scorecard.dev/viewer/?uri=github.com/NVIDIA/k8s-test-infra)

Kubernetes test infrastructure for NVIDIA GPU software — mock GPU environments, CI tooling, and testing utilities.

nvml-mock

Turn any Kubernetes cluster into a multi-GPU environment for testing. No physical NVIDIA hardware required.

# 1. Create cluster
kind create cluster --name gpu-test

# 2. Load the published image (or build locally with: docker build -t nvml-mock:local -f deployments/nvml-mock/Dockerfile .)
docker pull ghcr.io/nvidia/nvml-mock:latest
kind load docker-image ghcr.io/nvidia/nvml-mock:latest --name gpu-test

# 3. Install
helm install nvml-mock oci://ghcr.io/nvidia/k8s-test-infra/chart/nvml-mock

After install, deploy a consumer to test:

| Consumer | Guide | |----------|-------| | NVIDIA Device Plugin | [Quick Start](deployments/nvml-mock/helm/nvml-mock/README.md#quick-start-device-plugin-on-kind) | | NVIDIA DRA Driver | [Quick Start](deployments/nvml-mock/helm/nvml-mock/README.md#quick-start-dra-driver-on-kind) | | NVIDIA GPU Operator | [Quick Start](deployments/nvml-mock/helm/nvml-mock/README.md#quick-start-gpu-operator-on-kind) |

Full documentation: [nvml-mock Helm chart README](deployments/nvml-mock/helm/nvml-mock/README.md)

E2E Testing

The nvml-mock E2E workflow tests all GPU consumers across multiple profiles and node topologies. Run manually via workflow_dispatch or automatically on PRs.

| Test Suite | What It Validates | Profiles | |------------|-------------------|----------| | Device Plugin | nvidia.com/gpu allocatable resources | A100, H100, T4 | | DRA Driver | ResourceSlices via Dynamic Resource Allocation | A100, H100, T4 | | GPU Operator | Operator components: device plugin + GFD + validator (CDI injection) | A100, H100, T4 | | Multi-Node Fleet | Cross-node scheduling with heterogeneous GPUs | A100 + T4 |

Manual dispatch supports all 7 profiles: a100, h100, b200, gb200, gb300, l40s, t4.

See [.github/workflows/nvml-mock-e2e.yaml](.github/workflows/nvml-mock-e2e.yaml) for details.

Mock NVML Library

The underlying CGo-based mock libnvidia-ml.so that powers nvml-mock. Use standalone for local development and CI pipelines.

| Document | Description | |----------|-------------| | [Overview](docs/README.md) | Project overview, components, GPU profiles | | [Quick Start](docs/quickstart.md) | Build and run in 5 minutes | | [Configuration](docs/configuration.md) | YAML configuration reference | | [Architecture](docs/architecture.md) | System design and components | | [CUDA Mock](docs/cuda-mock.md) | Mock CUDA library overview | | [Development](docs/development.md) | Contributing and extending the library | | [Examples](docs/examples.md) | Usage patterns and scenarios | | [Troubleshooting](docs/troubleshooting.md) | Common issues and solutions |

Integrations

| Integration | Description | Guide | |-------------|-------------|-------| | fake-gpu-operator | Run:ai's K8s-level GPU simulation | [Integration Guide](docs/integrations/fake-gpu-operator.md) |

Demos

| Demo | Description | |------|-------------| | [Standalone](docs/demo/standalone/) | nvml-mock with FGO-style labels on Kind | | [With fake-gpu-operator](docs/demo/with-fgo/) | Full FGO + nvml-mock integration |

License

Apache License 2.0 — see [LICENSE](LICENSE).