NousResearch/pico
forked from HLC-Lab/pico
Captured source
source ↗NousResearch/pico
Description: PICO: Performance Insights for Collective Operations
License: NOASSERTION
Stars: 4
Forks: 1
Open issues: 0
Created: 2026-02-20T14:23:41Z
Pushed: 2026-02-19T17:54:12Z
Default branch: main
Fork: yes
Parent repository: HLC-Lab/pico
Archived: no
README:
PICO — Performance Insights for Collective Operations
> 💫 If you find PICO useful for your research or benchmarking work, please consider giving it a ⭐ on GitHub!
---
PICO is a lightweight, extensible, and reproducible benchmarking suite for evaluating and tuning collective communication operations across diverse libraries and hardware platforms.
Built for researchers, developers, and system administrators, PICO streamlines the entire benchmarking workflow—from configuration to execution, tracing, and analysis—across MPI, NCCL, and user-defined collectives.
⭐ Highlights
- 📦 Unified micro-benchmarking of both CPU and GPU collectives, across a variety of MPI libraries (Open MPI, MPICH, Cray MPICH), NCCL and user-defined algorithms.
- 🎛️ Guided configuration via a fully fledged Textual TUI or CLI-driven JSON/flag workflow with per-site presets.
- 📋 Reproducible runs through environment capture, metadata logging, and timestamped result directories.
- 🧩 Built-in correctness checks for custom collectives and automatic ground-truth validation.
- 🧭 Per-phase instrumentation, going beyond micro-benchmarking, hence the name PICO
- 🧵 Queue-friendly orchestration that compiles, ships, and archives jobs seamlessly on SLURM clusters or in local mode for debugging.
- 📊 Bundled plotting, tracing, and scheduling utilities for streamlined post-processing and algorithm engineering.
Architecture at a Glance
📁 Configuration ├─ 🧩 Sources: Textual TUI • JSON • CLI flags └─ ⚙️ Validation & module loading via submit_wrapper.sh 🚀 Orchestration ├─ 🧵 scripts/orchestrator.sh iterates over: │ • Libraries × Collectives × Message Sizes └─ 🏗️ Builds binaries and dispatches jobs (SLURM or local) 🧠 Execution ├─ pico_core / libpico executables ├─ ✅ Correctness checks └─ 🧭 Optional per-phase instrumentation 📊 Results ├─ results/// │ • CSV metrics │ • Logs │ • Metadata │ • Archives └─ Post-processing utilities: • plot/ • tracer/ • schedgen/
🚀 Quickstart
The recommended way to use PICO is through its Textual TUI, which guides you from configuration to job submission.
⚙️ 1. Configure Your Environment
Ensure you have at least one valid environment definition under config/environment/.
A working local sample is provided, modify it for your local machine.
For remote clusters, you should mirror one of the existing environment templates and adapt it to your site (a setup wizard to simplify this configuration is on its way!)
🧭 2. Create a virtual env and launch the TUI
Create and activate a Python virtual environment, then install the Python dependencies used by the TUI and analysis tools:
pip install -r requirements.txt
Start the interactive interface follow the four-step wizard: configure environment, select libraries, choose algorithms, and export.
python tui/main.py
🧩 3. Generate a Test Description
Within the TUI, define:
- The target collective(s)
- Message sizes and iteration counts
- Backends (MPI / NCCL / custom)
- Instrumentation and validation settings
The TUI will produce a test descriptor file encapsulating all these options.
The export lands in tests/.json (full configuration) and tests/.sh (shell exports).
🚀 4. Run the Benchmark
Execute the generated descriptor using the wrapper script, which handles compilation, dispatch, and archival:
scripts/submit_wrapper.sh -f [path_to_test_sh_file]
This command will orchestrate the full benchmarking workflow — locally or on SLURM clusters — using your defined environment.
🧰 Optional: CLI Workflow (Legacy)
You can still invoke PICO directly via the CLI to explore options or run ad-hoc tests. If that is desired, after step 1 do:
scripts/submit_wrapper.sh --help
> ⚠️ Note: The CLI path is currently *partially maintained*; some flags may be deprecated as functionality transitions to the TUI.
Example CLI invocation:
scripts/submit_wrapper.sh \ --location leonardo \ --nodes 8 \ --ntasks-per-node 32 \ --collectives allreduce,allgather \ --types int32,double \ --sizes 64,1024,65536 \ --segment-sizes 0 \ --time 01:00:00 \ --gpu-awareness no
- Provide comma-separated lists for datatypes, message sizes, and segment sizes.
- Use
--gpu-awareness yesand--gpu-per-nodeto benchmark NCCL or CUDA-aware MPI collectives. - Pass
--debug yesfor quick validation runs with reduced iterations and debug builds. - When
--compile-only yesis set, the script stops after buildingbin/pico_coreand its GPU counterpart.
💻 Dependencies
- A C/C++ compiler and MPI implementation (Open MPI, MPICH, or Cray MPICH). CUDA-aware MPI or NCCL is optional for GPU runs.
- (Optional) CUDA toolkit and a compatible NCCL build for GPU collectives.
- Python 3.9+ with
pipfor the TUI and analysis utilities (pip install -r requirements.txt). - SLURM for cluster submissions; local mode is supported for functional testing.
- Basic build tools (
make) and a Bash-compatible shell.
🧠 Core Components
pico_core/— C benchmarking driver that allocates buffers, times collectives, checks results, and writes output.libpico/— Library of custom collective algorithms and instrumentation helpers, selectable alongside vendor MPI/NCCL paths.scripts/submit_wrapper.sh— Entry point that parses CLI flags or TUI exports, loads site modules, builds binaries, activates Python envs, and launches SLURM or local runs.scripts/orchestrator.sh— Node-side runner that sweeps libraries, algorithm sets, GPU modes, message sizes, and datatypes while invoking metadata capture and optional compression.config/— Declarative environment, library, and algorithm descriptions consumed by the TUI and CLI (modules to load, compiler wrappers, task/GPU limits).tui/— Textual-based UI that guides the user through environment selection, library selection, algorithm mix, and exports the shell/JSON bundle for later submission.plot/— Python package and CLI (python -m plot …) that turns CSV summaries into…
Excerpt shown — open the source for the full document.
Notability
notability 3.0/10Low star fork, routine event.