What does this repo signal mean?

OpenBMB (MiniCPM) published OpenBMB/PilotDeck (TypeScript). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo OpenBMB/PilotDeck · language TypeScript · Visual framework for building AI agent workflows.. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

OpenBMB (MiniCPM) Repo: OpenBMB/PilotDeck

Captured source

source ↗

GitHub/github.com/OpenBMB/PilotDeck

OpenBMB/PilotDeck repository metadata

Source ↗

published May 22, 2026seen Jun 5captured Jun 11http 200method plain

OpenBMB/PilotDeck

Description: Task-oriented AI Agent productivity platform

Language: TypeScript

License: AGPL-3.0

Stars: 3129

Forks: 331

Open issues: 82

Created: 2026-05-22T06:50:28Z

Pushed: 2026-06-10T09:08:12Z

Default branch: main

Fork: no

Archived: no

README:

Task-oriented AI Agent productivity platform — redefining operational boundaries and memory evolution, one WorkSpace at a time.

English | 简体中文

Website · Live Demo · Tutorial · Quick Start · Highlights · Use Cases · Community

---

News 🔥

[2026.05.28] PilotDeck is now open source! Visit our official website at pilotdeck.openbmb.cn. We welcome contributions, feedback, and stars from the community.

---

💡 About PilotDeck

PilotDeck is an open-source agent operating system designed around the concept of "WorkSpace". It is jointly developed and open-sourced by Tsinghua University THUNLP, ModelBest, OpenBMB, and AI9Stars. Targeting general-purpose, multi-task scenarios, PilotDeck is built to be a true *productivity tool* for the Agent era.

A wave of excellent AI Agent harnesses has emerged in recent years, each with its own focus: Claude Code / Cursor / Trae Solo brought model reasoning deep into the programming IDE; Claude Cowork introduced the notion of project-level isolation to desktop-side knowledge work; WorkBuddy connected agents to IM ecosystems such as WeCom and Feishu so AI is one message away.

When we shift the lens from "one-shot programming" or "immediate Q&A" to long-running, multi-project productivity work, however, several questions remain open:

When many projects run in parallel, can memory be white-box and traceable? When the AI gets something wrong, can you pinpoint which memory entry caused it and edit it directly — without starting a new chat from scratch?
Can token cost be tracked per task, so that running agents in the background actually becomes economically viable?
Can tasks of different difficulty automatically be matched to different models, instead of burning the flagship model on trivial calls?
When you step away from the keyboard, can the work keep moving? Can the agent proactively discover what's worth doing, report progress, and land results as files on disk?

PilotDeck is an incremental exploration around exactly these questions. It uses the WorkSpace as the fundamental unit — completely isolating files, memory and skills per project — and pairs it with three pillar capabilities: White-box Memory, Smart Routing and Always-on. The entire system natively supports the Model Context Protocol (MCP) and behaves consistently across front-ends (Web / CLI / IM).

✨ Key Highlights

WorkSpace-Level Isolation & Accretion

Every project gets its own file system, memory store and skill set. Parallel work no longer interferes with itself, retrieval has a bounded scope, and skills accrete naturally as each task grows — no more global context pollution.

Traceable White-box Memory

Memory generation, extraction, storage and retrieval are visible end-to-end. When the AI mis-remembers, you can pinpoint and fix the offending entry. Built-in Dream Mode consolidates memory in idle windows, and supports one-click rollback.

Smart Routing & Cost Optimization

Task difficulty is auto-detected; complex calls go to flagship models (e.g. Claude 3.5 Sonnet / GPT-4o), simple ones drop to lighter models. Through on-device / cloud co-orchestration and precise matching, token spend shrinks dramatically without sacrificing quality.

Always-on Background Execution

PilotDeck breaks the "you ask, it answers" loop: after you sign off, the agent keeps discovering candidate tasks, running long-horizon monitors, and finally lands deliverables as local files with a summary report waiting for you.

📊 Real-world Numbers

The three pillar capabilities have shown clear advantages in production-grade workflows:

1. Smart Routing — ~70% cost savings on social-media workloads

In Xiaohongshu-style social-media operations, enabling Smart Routing automatically demotes simple polishing / layout tasks to a sub-agent (e.g. Sonnet 4.5) and only invokes Opus 4.5 at planning checkpoints:

Setup Model configuration Cost Multiplier

Smart Routing ON Opus 4.5 (main) + Sonnet 4.5 (sub) $2.83 1.1×

Smart Routing OFF All Opus 4.5 (main + sub) $12.58 5.0×

Monolithic Single Opus 4.5 long-react (estimated) $12.20 4.8×

2. Smart Routing — 1/6 the cost while beating frontier models on hard tasks

The research team benchmarked 7 complex tasks (multilingual podcast push, multi-source data reports, domain-specific literature review, codebase architecture docs, etc.). The "strong main + light sub" routing setup matches or beats the frontier single-model setup at a fraction of the cost:

Setting Score Cost

MiniMax-M2.7 single-agent 37.1 $1.90

Claude Sonnet 4.6 single-agent 69.1 $18.36

Sonnet 4.6 (main) + MiniMax-M2.7 (sub) 70.6 $3.15

3. White-box Memory — layout & tone never bleed across projects

In black-box agents, mixing tasks in a shared context pool inevitably pollutes memory. PilotDeck's WorkSpace-scoped white-box memory addresses this end-to-end:

Dimension Current AI Agents (black-box) PilotDeck (white-box)

Visibility You can't see what the AI remembers, only what it outputs View every memory entry: what was stored, when, and which WorkSpace

Control Once written, memory can't be edited or removed Edit / delete entries, pin critical decisions so they don't drift

Traceability When it goes wrong, you can't find the root cause Generation → extraction → storage → retrieval, all auditable

Isolation One shared pool — projects bleed into each other Scoped per WorkSpace; A's memory never reaches B

Reversible After compression, the original is gone Dream-mode supports one-click rollback to the prior state

---

🖥️ UI & Demo

PilotDeck ships an out-of-the-box Web UI with full WorkSpace management, white-box memory editing, and visualization of multi-agent collaboration.

Use Cases

> All demos below are generated entirely by edge-side models via PilotDeck's Smart Routing — no cloud-side frontier model required.

Work Document Generation

> *"Survey the Chinese LLM application market and turn it...

Excerpt shown — open the source for the full document.

Notability

notability 6.0/10

Solid new repo with good traction.