NVIDIA Levels Up Local AI Agents Across RTX PCs and DGX Spark
Captured source
source ↗Faster Local AI Agents on RTX PCs and DGX Spark | NVIDIA Blog
Skip to content
Personal agents are exploding in popularity, with open source projects like OpenClaw and Hermes seeing rapid adoption by AI developer communities on GitHub. Built to adapt to individual preferences and workflows, these agents can interact with applications, generate content, automate repetitive processes and manage multi-step tasks — all while running locally on device.
Today at NVIDIA GTC Taipei at COMPUTEX , NVIDIA unveiled NVIDIA RTX Spark — a new class of Windows PCs purpose-built for personal agents — alongside a wave of updates that expand local agents across the broader NVIDIA RTX and DGX ecosystems.
Running agents securely and privately requires hardware that’s up to the task. RTX Spark’s 1 petaflop of AI compute and 128GB of unified memory can meet the computing demand of on-device agents, offering a new class of computer that goes from tool to teammate. Designed for AI, creating and gaming, RTX Spark brings NVIDIA’s 30 years of technology innovation to slim Windows laptops with all-day battery life and ultraefficient desktop PCs.
NVIDIA’s partnership with Windows scales from personal to enterprise solutions. Also introduced at the show was NVIDIA DGX Station for Windows , the ultimate AI deskside supercomputer for professionals, bringing a data-center-class GPU and CPU for inference in a desktop system equipped with Windows for manageability, security and compatibility.
Other announcements include :
The NVIDIA OpenShell runtime is coming to Windows , built on Microsoft’s new security primitives for agents — providing developers an easy-to-deploy package for secure, on-device agents. Hermes Agent and OpenClaw will also integrate OpenShell and the Microsoft security primitives into their new Windows applications.
The NVIDIA NemoClaw blueprint is expanding across NVIDIA’s full local AI lineup — GeForce RTX, RTX PRO, RTX and DGX Spark, and DGX Station — with new streamlined installers and support for Hermes Agent.
2x inference performance on top agentic models with multi-token prediction in llama.cpp and vLLM, as well as new multi-GPU optimizations for llama.cpp and ComfyUI .
H Company is releasing computer-use tools — including new models and an upcoming desktop agent harness — optimized for RTX and DGX PCs.
Adobe is rearchitecting its Photoshop and Premiere apps, Blender is adding NVIDIA DLSS 4.5 Ray Reconstruction, and NVIDIA unveiled RTX Video Frame Generation, which will be coming to ComfyUI. All these updates arrive this fall with RTX Spark.
The NVIDIA Broadcast 2.2 update brings Studio Voice feature optimizations and Elgato Stream Deck support. NVIDIA Project G-Assist also adds Stream Deck integration.
Local Agentic AI: Personal, Private and Fast on Windows RTX PCs
Broad agent adoption has been limited by the inability to run agents securely and privately on users’ primary PCs.
NVIDIA and Microsoft are partnering to address this challenge by delivering a robust, secure Windows platform for on-device agents.
The collaboration begins with a strong foundation — new Windows security primitives and the NVIDIA OpenShell runtime — to ensure agents run safely and under full user control.
The new Windows primitives deliver identity, containment, policy and end-to-end security capabilities to build and run agents natively. NVIDIA OpenShell provides additional policy capabilities for the user to define what agents can and cannot do, the ability to intelligently route queries to local models based on the user’s privacy policies, and the ability to disguise personal information in queries sent to cloud models.
This robust security and privacy layer is being adopted by leading agent developers such as Hermes Agent and OpenClaw in their new Windows apps. These new apps will make it easy and secure for users to access powerful on-device agents that can execute tasks in Windows applications, reason through cross-app workflows, generate images and video, code plug-ins and apps, and semantically search local files.
Powering agents on local devices requires both robust security and performant hardware. RTX Spark features up to 1 petaflop of AI compute and 128GB of unified memory to meet the processing demands of on-device agents.
NVIDIA is also accelerating the local open model ecosystem these agents rely on.
NVIDIA collaborated with the llama.cpp community to enable features and optimizations such as multi-token prediction (MTP) — a speculative decoding technique where a smaller draft model proposes multiple tokens at a time that the target model verifies in a single pass. This coupled with other optimizations such as programmatic dependent launch delivers 2x performance on Qwen 3.6 and 3.5 27B, and a 1.6x performance boost on Qwen 3.6 and 3.5 35B. These updates are available via the llama.cpp webUI and LM Studio .
Performance gains shown with latest NVIDIA optimizations to llama.cpp: Qwen3.6-27B delivers up to 2x throughput and Qwen3.6-35B up to 1.6x on GeForce RTX 5090, accelerating local agentic AI workloads through open source community collaboration. For AI enthusiasts running multi-GPU rigs, NVIDIA collaborated with the open source community to enhance two of the most popular local AI tools:
llama.cpp adds tensor parallelism for up to 2x memory and 1.8x compute on two equivalent GPUs.
ComfyUI gains a new classifier-free guidance method for up to 2x performance on two equivalent GPUs, plus the option to split model chains across GPUs to take advantage of the combined memory.
Shows token generation performance improvements for the Tensor Parallel Multi-GPU technique over pipeline parallel and single-GPU inferencing on llama.cpp. Shows generation time performance improvements for multi-GPU techniques on ComfyUI. NVIDIA is also expanding agent capabilities with H Company . H Company’s computer-use harness lets agents navigate a PC by seeing the screen and operating a mouse and keyboard just like a user, even in apps with no application programming interfaces, and is coming soon to RTX and DGX PCs with local model support.
NVIDIA has collaborated with H Company to quantize its state-of-the-art Holo Computer Use models, as well as accelerate its harness — driving a 2x speedup on NVIDIA GPUs while reducing memory consumption by 35%. The models are available for download now, and the Holo Desktop app will be available soon.
Agent Optimizations for Linux
For developers who need…
Excerpt shown — open the source for the full document.
Notability
notability 6.0/10NVIDIA local AI agent update