ForkNous ResearchNous Researchpublished Feb 20, 2026seen 5d

NousResearch/pico

forked from HLC-Lab/pico

Open original ↗

Captured source

source ↗
published Feb 20, 2026seen 5dcaptured 14hhttp 200method plain

NousResearch/pico

Description: PICO: Performance Insights for Collective Operations

License: NOASSERTION

Stars: 4

Forks: 1

Open issues: 0

Created: 2026-02-20T14:23:41Z

Pushed: 2026-02-19T17:54:12Z

Default branch: main

Fork: yes

Parent repository: HLC-Lab/pico

Archived: no

README:

PICO — Performance Insights for Collective Operations

> 💫 If you find PICO useful for your research or benchmarking work, please consider giving it a ⭐ on GitHub!

---

PICO is a lightweight, extensible, and reproducible benchmarking suite for evaluating and tuning collective communication operations across diverse libraries and hardware platforms.

Built for researchers, developers, and system administrators, PICO streamlines the entire benchmarking workflow—from configuration to execution, tracing, and analysis—across MPI, NCCL, and user-defined collectives.

⭐ Highlights

  • 📦 Unified micro-benchmarking of both CPU and GPU collectives, across a variety of MPI libraries (Open MPI, MPICH, Cray MPICH), NCCL and user-defined algorithms.
  • 🎛️ Guided configuration via a fully fledged Textual TUI or CLI-driven JSON/flag workflow with per-site presets.
  • 📋 Reproducible runs through environment capture, metadata logging, and timestamped result directories.
  • 🧩 Built-in correctness checks for custom collectives and automatic ground-truth validation.
  • 🧭 Per-phase instrumentation, going beyond micro-benchmarking, hence the name PICO
  • 🧵 Queue-friendly orchestration that compiles, ships, and archives jobs seamlessly on SLURM clusters or in local mode for debugging.
  • 📊 Bundled plotting, tracing, and scheduling utilities for streamlined post-processing and algorithm engineering.

Architecture at a Glance

📁 Configuration
├─ 🧩 Sources: Textual TUI • JSON • CLI flags
└─ ⚙️ Validation & module loading via submit_wrapper.sh

🚀 Orchestration
├─ 🧵 scripts/orchestrator.sh iterates over:
│ • Libraries × Collectives × Message Sizes
└─ 🏗️ Builds binaries and dispatches jobs (SLURM or local)

🧠 Execution
├─ pico_core / libpico executables
├─ ✅ Correctness checks
└─ 🧭 Optional per-phase instrumentation

📊 Results
├─ results///
│ • CSV metrics
│ • Logs
│ • Metadata
│ • Archives
└─ Post-processing utilities:
• plot/ • tracer/ • schedgen/

🚀 Quickstart

The recommended way to use PICO is through its Textual TUI, which guides you from configuration to job submission.

⚙️ 1. Configure Your Environment

Ensure you have at least one valid environment definition under config/environment/.

A working local sample is provided, modify it for your local machine.

For remote clusters, you should mirror one of the existing environment templates and adapt it to your site (a setup wizard to simplify this configuration is on its way!)

🧭 2. Create a virtual env and launch the TUI

Create and activate a Python virtual environment, then install the Python dependencies used by the TUI and analysis tools:

pip install -r requirements.txt

Start the interactive interface follow the four-step wizard: configure environment, select libraries, choose algorithms, and export.

python tui/main.py

🧩 3. Generate a Test Description

Within the TUI, define:

  • The target collective(s)
  • Message sizes and iteration counts
  • Backends (MPI / NCCL / custom)
  • Instrumentation and validation settings

The TUI will produce a test descriptor file encapsulating all these options.

The export lands in tests/.json (full configuration) and tests/.sh (shell exports).

🚀 4. Run the Benchmark

Execute the generated descriptor using the wrapper script, which handles compilation, dispatch, and archival:

scripts/submit_wrapper.sh -f [path_to_test_sh_file]

This command will orchestrate the full benchmarking workflow — locally or on SLURM clusters — using your defined environment.

🧰 Optional: CLI Workflow (Legacy)

You can still invoke PICO directly via the CLI to explore options or run ad-hoc tests. If that is desired, after step 1 do:

scripts/submit_wrapper.sh --help

> ⚠️ Note: The CLI path is currently *partially maintained*; some flags may be deprecated as functionality transitions to the TUI.

Example CLI invocation:

scripts/submit_wrapper.sh \
--location leonardo \
--nodes 8 \
--ntasks-per-node 32 \
--collectives allreduce,allgather \
--types int32,double \
--sizes 64,1024,65536 \
--segment-sizes 0 \
--time 01:00:00 \
--gpu-awareness no
  • Provide comma-separated lists for datatypes, message sizes, and segment sizes.
  • Use --gpu-awareness yes and --gpu-per-node to benchmark NCCL or CUDA-aware MPI collectives.
  • Pass --debug yes for quick validation runs with reduced iterations and debug builds.
  • When --compile-only yes is set, the script stops after building bin/pico_core and its GPU counterpart.

💻 Dependencies

  • A C/C++ compiler and MPI implementation (Open MPI, MPICH, or Cray MPICH). CUDA-aware MPI or NCCL is optional for GPU runs.
  • (Optional) CUDA toolkit and a compatible NCCL build for GPU collectives.
  • Python 3.9+ with pip for the TUI and analysis utilities (pip install -r requirements.txt).
  • SLURM for cluster submissions; local mode is supported for functional testing.
  • Basic build tools (make) and a Bash-compatible shell.

🧠 Core Components

  • pico_core/ — C benchmarking driver that allocates buffers, times collectives, checks results, and writes output.
  • libpico/ — Library of custom collective algorithms and instrumentation helpers, selectable alongside vendor MPI/NCCL paths.
  • scripts/submit_wrapper.sh — Entry point that parses CLI flags or TUI exports, loads site modules, builds binaries, activates Python envs, and launches SLURM or local runs.
  • scripts/orchestrator.sh — Node-side runner that sweeps libraries, algorithm sets, GPU modes, message sizes, and datatypes while invoking metadata capture and optional compression.
  • config/ — Declarative environment, library, and algorithm descriptions consumed by the TUI and CLI (modules to load, compiler wrappers, task/GPU limits).
  • tui/ — Textual-based UI that guides the user through environment selection, library selection, algorithm mix, and exports the shell/JSON bundle for later submission.
  • plot/ — Python package and CLI (python -m plot …) that turns CSV summaries into…

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Low star fork, routine event.