NVIDIA/NeMo-Relay
Rust
Captured source
source ↗NVIDIA/NeMo-Relay
Description: Multi-language agent runtime for execution scope management, lifecycle events, and middleware on tool and LLM calls.
Language: Rust
License: Apache-2.0
Stars: 56
Forks: 26
Open issues: 5
Created: 2026-03-31T23:22:24Z
Pushed: 2026-06-11T01:19:13Z
Default branch: main
Fork: no
Archived: no
README:
 
NVIDIA NeMo Relay
What Is NeMo Relay?
NVIDIA NeMo Relay is a portable execution runtime for agent systems that already have a framework, model provider, policy layer, or observability backend. It gives those systems one consistent way to describe, control, and observe what happens when an agent crosses a request, tool, or LLM boundary.
Agent applications rarely live inside one clean abstraction. A production stack might combine NeMo Agent Toolkit, LangChain, LangGraph, provider SDKs, custom harness code, NeMo Guardrails, tracing systems, and evaluation pipelines. NeMo Relay sits underneath those choices as the shared runtime contract for scopes, middleware, plugins, lifecycle events, adaptive behavior, and observability.
Built as a Rust core with primary Rust, Python, and Node.js bindings, NeMo Relay lets applications keep their orchestration model while runtime behavior stays consistent across frameworks and languages.
Why Use It?
- 🧭 Own execution context across the whole agent run: Hierarchical scopes
attach tools, LLM calls, middleware, subscribers, and events to the same parent-child execution tree.
- 🛡️ Package policy once: Guardrails and intercepts can block work, sanitize
observability payloads, transform requests, or wrap execution without rewriting every call site.
- 📡 Emit one lifecycle stream: Subscribers consume canonical runtime events
in-process or export them as ATIF v1.7 trajectories, OpenTelemetry traces, or OpenInference-compatible traces.
- 🧩 Integrate without a framework migration: NeMo Relay can sit below NeMo
ecosystem components, third-party agent frameworks, provider adapters, or direct application code.
- ⚙️ Install reusable runtime behavior: Plugins configure middleware,
subscribers, adaptive components, observability exporters, and custom runtime behavior from one shared system.
What You Get
- ✅ Managed tool and LLM execution: Run call boundaries through consistent
lifecycle helpers and middleware ordering.
- ✅ Concurrent request isolation: Keep request-local middleware and
subscribers attached to the scope that owns them, then clean them up when that scope closes.
- ✅ Multi-language semantics: Use the same runtime model from Rust, Python,
and Node.js.
- ✅ Observability-ready events: Preserve model metadata, tool call IDs,
inputs, outputs, scope relationships, and lifecycle timing for downstream analysis.
- ✅ Built-in observability plugin: Configure Agent Trajectory Observability
Format (ATOF), ATIF, OpenTelemetry, and OpenInference exporters without registering subscribers by hand.
- ✅ Non-blocking subscriber delivery: Keep managed execution moving while
subscriber callbacks and exporters drain in the background. Flush subscribers before relying on callback side effects or exported files in tests and shutdown paths.
- ✅ Extension points for framework authors: Wrap stable tool and provider
callbacks while preserving framework-owned scheduling, retries, memory, and result handling.
flowchart LR App[Application or Framework] subgraph Runtime[NeMo Relay Runtime] direction TB Scopes[Scopes] Middleware[Middleware] Plugins[Plugins] Events[Lifecycle Events] end Output[Subscribers and Exporters] App --> Scopes App --> Middleware Plugins --> Middleware Scopes --> Events Middleware --> Events Events --> Output
Installation
Install the published package for your language:
# Rust cargo add nemo-relay # Python uv add nemo-relay # Node.js npm install nemo-relay-node
The Node.js package requires Node.js 24 or newer.
CLI Installation
The NeMo Relay CLI is offered as a separate crate:
cargo install nemo-relay-cli
If cargo-binstall is available on your machine:
cargo binstall nemo-relay-cli
For source builds, testing, and contribution workflow, see [CONTRIBUTING.md](CONTRIBUTING.md).
Documentation
End-user documentation lives at docs.nvidia.com/nemo/relay.
The primary documentation track covers Rust, Python, and Node.js.
The Go, WebAssembly, and raw FFI surfaces are currently experimental and remain source-first under go/nemo_relay, crates/wasm, and crates/ffi.
Binding Status
The table below summarizes the support level for each binding surface.
| Binding | Status | Notes | |---|---|---| | Python | ✅ Fully Supported | Fully documented with Quick Start and Guides | | Node.js | ✅ Fully Supported | Fully documented with Quick Start and Guides | | Rust | ✅ Fully Supported | Fully documented with Quick Start and Guides | | NeMo Relay CLI | 🚧 Experimental | Install with cargo install nemo-relay-cli. | | Go | 🚧 Experimental | Source-first under go/nemo_relay. | | WebAssembly | 🚧 Experimental | Source-first under crates/wasm. | | FFI | 🚧 Experimental | Source-first under crates/ffi. |
Agent Harness Support
NeMo Relay CLI offers experimental support for several agent harnesses. Refer to the NeMo Relay CLI documentation for additional information.
Below is our support matrix for agent harnesses.
| Agent | Observability | Security | Optimization | Notes | |:--|:--:|:--:|:--:|:--| | Claude Code | ✅ Yes | ⚠️ Partial | ⚠️ Partial | Tool guardrail support is wired up. LLM optimization is in place. | | Codex | ✅ Yes | ⚠️ Partial | ⚠️ Partial | Tool guardrail support is wired up. LLM optimization is in place. Missing some necessary hooks for full feature parity. | | Hermes Agent | ✅ Yes | ⚠️ Partial | ⚠️ Partial | Tool guardrail support is wired up. LLM optimization is in place. | | Cursor | ✅ Yes | ⚠️ Partial | ⚠️ Partial | Tool guardrail support is wired up. LLM optimization is in place. Not feature-rich, missing hooks under cursor-agent |
Third-Party Integrations
Some framework integrations are maintained as packages…
Excerpt shown — open the source for the full document.
Notability
notability 6.0/10New NVIDIA repo, moderate traction