NVIDIA/nvcf
Go
Captured source
source ↗NVIDIA/nvcf
Description: NVIDIA Cloud Functions
Language: Go
License: Apache-2.0
Stars: 164
Forks: 14
Open issues: 8
Created: 2026-04-01T19:22:14Z
Pushed: 2026-06-11T00:15:15Z
Default branch: main
Fork: no
Archived: no
README: 

Docs | Roadmap | [Installation](docs/user/installation.md) | [API Reference](docs/user/api.md) | [Contributing](CONTRIBUTING.md) | build.nvidia.com Powered By NVCF
NVIDIA Cloud Functions
NVIDIA Cloud Functions (NVCF) is a platform for deploying, managing, and running GPU-accelerated workloads at scale. It routes inference, streaming, and other GPU work to worker clusters, so you can scale demanding workloads with less infrastructure to run yourself.
This monorepo contains NVCF service code, deployment assets, documentation, examples, CLI code, agent skills, and validation tooling.
Architecture

NVCF runs as Kubernetes services that manage function lifecycle, invocation routing, GPU cluster integration, artifact access, secrets, observability, and operations.
At a high level:
- The control plane exposes the NVCF API, manages function and deployment
state, handles secret management, and coordinates platform operations.
- The invocation plane receives HTTP, streaming, and gRPC requests, applies
routing and rate limiting, and sends work to running function workloads.
- GPU clusters connect through the NVIDIA Cluster Agent (NVCA). NVCA registers
GPU resources and manages workload execution on GPU nodes.
- Function artifacts live in registries that the NVCF deployment can access.
- Observability, dashboards, and runbooks help operators monitor health and
debug workload behavior.
The following diagram shows how self-managed NVCF can span regions and GPU clusters.
Workload types
NVCF functions are long-running, invokable workloads. Use a function when a client needs an endpoint for inference, streaming, or another service-style GPU workflow. Functions can be packaged as a container when the workload is a single service with health and inference endpoints, or as a Helm chart when the workload needs multiple coordinated containers, services, sidecars, or other Kubernetes resources.
NVCF tasks are asynchronous, run-to-completion workloads. Use a task for batch inference, evaluation, fine-tuning, data preparation, or other GPU jobs that should finish and report status instead of staying online behind an invocation endpoint. Tasks can be packaged as a container when the workload is a single service with health and inference endpoints, or as a Helm chart when the workload needs multiple coordinated containers, services, sidecars, or other Kubernetes resources.
Core capabilities
| Capability | What it does | |------------|--------------| | Unified control plane | Manages and routes requests across multi-region GPU clusters. | | Load-balanced workload routing | Balances inference, streaming, and custom workloads based on worker availability. | | Multiple protocols | Supports multiple protocols for different workload and client needs. | | Multi-cluster autoscaling | Scales workloads from zero to max across clusters. | | Mixed GPU support | Supports mixed GPU types across clusters for workloads with different GPU requirements. | | Health checks and telemetry | Tracks worker status and request latency through health checks and telemetry. |
Repository map
| Area | Paths | Purpose | |------|-------|---------| | Control plane | [src/control-plane-services/](src/control-plane-services/) | APIs and services that manage NVCF function and deployment state. | | Invocation plane | [src/invocation-plane-services/](src/invocation-plane-services/) | HTTP invocation, gRPC proxying, rate limiting, LLM gateway paths, and request authorization. | | Compute plane | [src/compute-plane-services/](src/compute-plane-services/) | GPU cluster integration, cache services, image credentials, ESS Agent, and telemetry collection. | | CLI and libraries | [src/clis/](src/clis/), [src/libraries/](src/libraries/) | User and developer clients plus shared Go and Python code. | | Deployment | [deploy/](deploy/), [migrations/](migrations/) | Helm charts, stack installation, infrastructure services, and datastore migrations. | | Documentation | [docs/user/](docs/user/index.md), [docs/dev/](docs/dev/), [fern/](fern/) | Self-managed user docs, developer docs, and published docs navigation. | | Examples | [examples/](examples/) | Local development guides, function samples, and load-test assets. | | Tools | [tools/](tools/) | Build, docs, dependency, license, and validation utilities. | | AI tooling | [ai-tooling/](ai-tooling/) | Public agent skills and workflow helpers for NVCF users and developers. |
Building with Bazel
Bazel is the build, test, and packaging tool across the monorepo. Native subtrees (src/clis/nvcf-cli, src/libraries/go/lib) build fully under Bazel today. Phase B has additionally landed Bazel scaffolds in synthetic-import upstreams: nvcf-grpc-proxy, nvcf-ratelimiter, nvcf-nats-auth-callout-service, nvcf-cache/nvcf-unbound (dns-cache), nvcf-image-credential-helper, and nvca. Their BUILD.bazel, MODULE.bazel, and rules/oci/ files are picked up automatically when the subtrees are synced into the umbrella; from the umbrella you can build, test, and produce OCI images for any of them without leaving the monorepo.
Quick start (Linux):
curl -fSL -o ~/.local/bin/bazel \ "https://github.com/bazelbuild/bazelisk/releases/download/v1.25.0/bazelisk-linux-$(dpkg --print-architecture)" chmod +x ~/.local/bin/bazel # Native subtrees bazel build //src/clis/nvcf-cli:nvcf-cli # host binary bazel test //src/clis/nvcf-cli/... # unit tests bazel build //src/clis/nvcf-cli:dist # all 5 platforms # Phase B upstream example: build the grpc-proxy multi-arch OCI image bazel build //src/invocation-plane-services/grpc-proxy:image_index bazel test //src/invocation-plane-services/grpc-proxy/... # Run the full tree bazel test //...
Quick start (macOS):
brew install bazelisk bazel build…
Excerpt shown — open the source for the full document.
Notability
notability 5.0/10New NVIDIA repo for NVCF; moderate 161 stars