What does this release signal mean?

NVIDIA published NVIDIA/cloudai v1.6.1 (NVIDIA/cloudai). This release signal is evidence of what shipped, changed, or was packaged for users. High-signal details: NVIDIA's toolkit for deploying AI models on cloud infrastructure. · v1.6.1 Repository: NVIDIA/cloudai Tag: v1.6.1 Published: 2026-06-02T14:59:00Z Prerelease: no Release notes: New Changes - Added support for the following workloads: -.... onlylabs links this event to 1 captured evidence page and 6 related release signals.

NVIDIA Release: NVIDIA/cloudai v1.6.1

Captured source

source ↗

GitHub/github.com/NVIDIA/cloudai

NVIDIA/cloudai v1.6.1

Source ↗

published Jun 2, 2026seen Jun 6captured Jun 11http 200method plain

v1.6.1

Repository: NVIDIA/cloudai

Tag: v1.6.1

Published: 2026-06-02T14:59:00Z

Prerelease: no

Release notes:

New Changes

Added support for the following workloads:
**vLLM** - LLM serving benchmark support with Slurm execution, disaggregated prefill/decode mode, multi-node serving, reporting, DSE metrics, and NIXL-related options
**SGLang** - LLM serving benchmark support sharing the common vLLM/SGLang serving flow, reporting, health checks, and multi-node execution
**NIXL EP** - NIXL Expert Parallelism workload with Slurm command generation, log parsing, reporting, and tests
Added DSE reporting, including richer visualization of design-space exploration results and best-configuration selection
Added report generation for MegatronRun and OSU benchmarks
Added support for CNI specification configuration for NCCL and AI Dynamo workloads on Kubernetes

Backward Compatibility Notes

1. AI Dynamo configuration schema

Worker settings now use explicit prefill_worker and decode_worker blocks with nested args.
Older fields such as prefill-cmd, decode-cmd, top-level worker parallelism keys, run_script, and huggingface_home_container_path should be migrated to the new schema.

2. Megatron-Bridge configuration schema

model_family_name and model_recipe_name replace the earlier model_name and model_size fields.
time_limit is now taken from the test run rather than cmd_args.
A Megatron-Bridge git repo only overrides the container copy when mount_as = "/opt/Megatron-Bridge" is set.

3. Custom workload implementations

Custom workloads that override constraint_check(self, tr) should update the method signature to accept the new system argument.

LLM Serving Improvements

CloudAI now includes first-class support for vLLM and SGLang serving workloads. The implementation includes shared serving infrastructure, Slurm command generation, result reporting, disaggregated prefill/decode support, two-node serving flows, custom health check endpoints, and more robust startup, shutdown, and cleanup handling. vLLM also supports DSE metrics, NIXL thread options, boolean flag handling, and constraint checks.

Megatron and Megatron-Bridge Improvements

Megatron-Bridge support was updated for r0.3.0 recipes and improved configuration handling. GPU counts can be derived from the system configuration, time limits are managed by the test run, VP parameters are handled more reliably, and status checks reduce false passes. MegatronRun now has report generation support and improved success detection, including timeout handling.

NIXL, Kubernetes, and Networking

NIXL workloads gained a new EP workload, updated CLI argument handling, support for separate ETCD containers, improved ETCD failure handling, safer mount cleanup, and installable fixes around nested Docker image paths and submodules. Kubernetes support was improved with CNI spec handling for NCCL and AI Dynamo, while NCCL Kubernetes tests were refactored for better reuse and temporary-resource management.

Reporting, Configuration, and Parsing

Reporting now includes DSE reports, OSU benchmark reports, MegatronRun reports, and reward override support for constraint failures. Configuration handling is more robust with improved duplicate-key errors, system config detection, path expansion/storage, first-sweep messaging, and agent configuration/caching updates.

Architecture, Reliability, and Tooling

Job monitoring no longer relies on asyncio, heavy imports are blocked at module level, and command shell checks no longer run during object creation. Slurm handling was improved around node exclusion, reservation nodes, GPU resource requesting, and propagation of extra Slurm arguments. Tooling was refreshed with pre-commit, updated CI workflows, uv usage in CI, Node 24-compatible GitHub Actions, broader tests organized by system/workload, and dependency updates.

Documentation

Documentation was expanded for vLLM, SGLang, NIXL EP, Systems, workload requirements, reporting, troubleshooting, and tutorial/user guide content. Workload pages and release configurations were updated to match the new workloads and configuration flows.

All Changed

Bump to v1.6 + upgrade dependencies by @amaslenn in https://github.com/NVIDIA/cloudai/pull/798
Upgrade GitHub Actions to latest versions by @salmanmkc in https://github.com/NVIDIA/cloudai/pull/751
Upgrade GitHub Actions for Node 24 compatibility by @salmanmkc in https://github.com/NVIDIA/cloudai/pull/750
Ban "heavy" imports on module level by @amaslenn in https://github.com/NVIDIA/cloudai/pull/801
Remove asyncio usage in jobs monitoring by @amaslenn in https://github.com/NVIDIA/cloudai/pull/796
Bump pillow from 12.1.0 to 12.1.1 by @dependabot[bot] in https://github.com/NVIDIA/cloudai/pull/802
Add report generation strategy for the MegatronRun by @juntaowww in https://github.com/NVIDIA/cloudai/pull/787
Fix accedentially reverted version bump by @amaslenn in https://github.com/NVIDIA/cloudai/pull/805
Add support for running vLLM by @amaslenn in https://github.com/NVIDIA/cloudai/pull/799
Unit-tests per system/workload by @podkidyshev in https://github.com/NVIDIA/cloudai/pull/808
Fix nsys subfield merging behavior by @juntaowww in https://github.com/NVIDIA/cloudai/pull/795
Add support for setting NIXL num threads for vLLM CLI by @amaslenn in https://github.com/NVIDIA/cloudai/pull/809
Fix base_tr fixture dependency by @podkidyshev in https://github.com/NVIDIA/cloudai/pull/810
Fixes CLOUDAI-15: Updated copyright check by @podkidyshev in https://github.com/NVIDIA/cloudai/pull/811
Add report generation for OSU Benchmark by @allkoow in https://github.com/NVIDIA/cloudai/pull/807
Single sbatch + NIXL + ETCD issues by @podkidyshev in https://github.com/NVIDIA/cloudai/pull/812
Support separate ETCD container for NIXL workloads by @amaslenn in https://github.com/NVIDIA/cloudai/pull/813
Yet another attempt on the right copyright by @podkidyshev in https://github.com/NVIDIA/cloudai/pull/815
Refactor NCCL k8s test cases to improve re-use and temp resources management by @amaslenn in https://github.com/NVIDIA/cloudai/pull/817
Support DSE metrics for vLLM by @amaslenn in https://github.com/NVIDIA/cloudai/pull/816
Agent configs by...

Excerpt shown — open the source for the full document.

Notability

notability 4.0/10

Minor update from major company, no traction