NVIDIA/stdexec
C++
Captured source
source ↗NVIDIA/stdexec
Description: std::execution, the standard C++ framework for asynchronous and parallel programming.
Language: C++
License: Apache-2.0
Stars: 2360
Forks: 247
Open issues: 151
Created: 2021-05-10T22:13:18Z
Pushed: 2026-06-10T17:43:47Z
Default branch: main
Fork: no
Archived: no
README:
stdexec — Senders for C++
A reference implementation of `std::execution` ([\[exec\]](https://wg21.link/exec)), the C++26 model for asynchronous and parallel programming.
 
stdexec lets you express asynchronous work as composable, lazy *sender* pipelines that can run on threads, thread pools, GPUs, or any custom execution context — with structured concurrency guarantees.
> [!WARNING] > stdexec is experimental and tracks an evolving standard. APIs may change without notice. NVIDIA does not guarantee fitness for any particular purpose.
Table of contents
- [Example](#example)
- [Features](#features)
- [Compiler support](#compiler-support)
- [Installation](#installation)
- [Quick start](#quick-start)
- [GPU support](#gpu-support)
- [Examples gallery](#examples-gallery)
- [Documentation](#documentation)
- [Building tests and examples](#building-tests-and-examples)
- [IDE support](#ide-support)
- [Resources](#resources)
- [Contributing](#contributing)
- [Citation](#citation)
- [License](#license)
Example
Run three pieces of work concurrently on the system thread pool. Try it live on godbolt.
#include
#include
namespace ex = stdexec;
int main() {
auto sched = ex::get_parallel_scheduler();
auto fun = [](int i) { return i * i; };
// Build a lazy pipeline: three squares, computed in parallel.
auto work = ex::when_all(ex::on(sched, ex::just(0) | ex::then(fun)),
ex::on(sched, ex::just(1) | ex::then(fun)),
ex::on(sched, ex::just(2) | ex::then(fun)));
// Launch the work and wait for the result.
auto [i, j, k] = ex::sync_wait(std::move(work)).value();
std::printf("%d %d %d\n", i, j, k); // prints "0 1 4"
}Features
- C++26 reference implementation of
std::execution(P2300). - Header-only, no external dependencies.
- Composable algorithms:
then,let_value,when_all,bulk,split,transfer,upon_*, ... - Structured concurrency primitives:
async_scope,task,finally,when_any,repeat_n, ... - Pluggable schedulers: system parallel scheduler, static thread pool, Linux
io_uringcontext, NVIDIA GPU contexts, your own. - GPU offload via
nvexecschedulers (nvc++compiler). - Coroutine interop: senders are awaitable; awaitables are senders.
- Generic extensions (``) for primitives not (yet) in the standard.
Compiler support
| Compiler | Minimum version | Notes | |---|---|---| | GCC | 12 | | | Clang | 16 | | | MSVC | 14.43 | | | Xcode (Apple Clang) | 16 | | | nvc++ | 25.9 | required for [GPU support](#gpu-support) |
Requires -std=c++20 or later.
> [!NOTE] > stdexec does not yet support NVIDIA's nvcc compiler.
Installation
Pick whichever fits your project.
CPM (recommended)
CPM fetches and configures stdexec automatically from your CMakeLists.txt:
CPMAddPackage( NAME stdexec GITHUB_REPOSITORY NVIDIA/stdexec GIT_TAG main # or a specific tag ) target_link_libraries(my_target PRIVATE STDEXEC::stdexec)
add_subdirectory
Clone alongside your project and add it as a subdirectory:
git clone https://github.com/NVIDIA/stdexec.git
add_subdirectory(stdexec) target_link_libraries(my_target PRIVATE STDEXEC::stdexec)
Conan
A [conanfile.py](conanfile.py) is provided for use with the Conan package manager.
NVIDIA HPC SDK
Starting with NVHPC SDK 22.11, stdexec is bundled with nvc++. Pass --experimental-stdpar to put stdexec headers on the include path. Add -stdpar=gpu for GPU features. See the godbolt example.
Manual include path
stdexec is header-only, so adding -I/include to your compile command is sufficient. Using the CMake target is recommended because it sets the required compile flags.
Quick start
A minimal CMakeLists.txt using CPM:
cmake_minimum_required(VERSION 3.25.0) project(stdexec_example LANGUAGES CXX) include(CPM.cmake) # see https://github.com/cpm-cmake/CPM.cmake#adding-cpm CPMAddPackage( NAME stdexec GITHUB_REPOSITORY NVIDIA/stdexec GIT_TAG main ) add_executable(example example.cpp) target_link_libraries(example PRIVATE STDEXEC::stdexec)
GPU support
stdexec ships GPU schedulers in [`](include/nvexec/) for use with nvc++ -stdpar=gpu`:
| Scheduler | Header | Description | |---|---|---| | nvexec::stream_scheduler | [`](include/nvexec/stream_context.cuh) | Single-GPU scheduler (device 0). | | nvexec::multi_gpu_stream_scheduler | [`](include/nvexec/multi_gpu_context.cuh) | Multi-GPU scheduler across all visible devices. |
Live example:
Examples gallery
The [examples/](examples/) directory contains runnable programs demonstrating the library.
| Example | What it shows | |---|---| | [hello_world.cpp](examples/hello_world.cpp) | The "hello world" of senders. | | [hello_coro.cpp](examples/hello_coro.cpp) | Awaiting a sender from a coroutine. | | [then.cpp](examples/then.cpp) | Writing a then algorithm from scratch. | | [retry.cpp](examples/retry.cpp) | Writing a retry algorithm from scratch. | | [scope.cpp](examples/scope.cpp) | Structured concurrency with async_scope. | | [io_uring.cpp](examples/io_uring.cpp) | Async I/O via the Linux io_uring context. | | [sudoku.cpp](examples/sudoku.cpp) | A parallel sudoku solver. | | [server_theme/](examples/server_theme/) | Server-style patterns (let_value, split, bulk, transfer). | | [nvexec/](examples/nvexec/) | GPU schedulers, including the Maxwell solver. |
Documentation
📖 Full documentation:
- User guide: ([source](docs/source/user/))
- Reference: ([source](docs/source/reference/))
- Developer docs: ([source](docs/source/developer/))
- Contributing to docs:…
Excerpt shown — open the source for the full document.