Snowflake-Labs/agent-world-model
Python
Captured source
source ↗Snowflake-Labs/agent-world-model
Description: Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning
Language: Python
Stars: 375
Forks: 42
Open issues: 1
Created: 2026-02-08T03:02:35Z
Pushed: 2026-05-28T21:36:00Z
Default branch: main
Fork: no
Archived: no
README: Agent World Model
Infinity Synthetic Environments for Agentic Reinforcement Learning
Zhaoyang Wang1, Canwen Xu2, Boyi Liu2, Yite Wang2, Siwei Han1,
Zhewei Yao2, Huaxiu Yao1, Yuxiong He2
1UNC-Chapel Hill 2Snowflake AI Research
Agent World Model (AWM) is a fully synthetic environment generation pipeline that synthesizes 1,000 executable, SQL database-backed tool-use environments exposed via unified MCP interface for large-scale multi-turn agentic reinforcement learning.
---
📣 News
- May 1, 2026: AWM got accepted to ICML 2026 🎉, and its infra got merged into meta-pytorch/OpenEnv supporting large-scale agentic RL training! Have a live demo try at huggingface space 🤗!
- Mar 16, 2026: we added the verification demo, please refer to [Verification](#verification) section!
- Feb 10, 2026: we open-sourced the synthesis pipeline, 1,000 synthesized environments and RL trained agents at Huggingface!
🔮 Resources
We released the syntheszied 1,000 executable environments and corresponding tasks, databases, and verification in huggingface. Please checkout huggingface repo at Snowflake/AgentWorldModel-1K. You can freely interact with these environments online at HuggingFace Space.
| Resource | Link | |----------|------| | 📄 Paper | 📄 arxiv.org/abs/2602.10090 | | 💻 Code | 💻 Snowflake-Labs/agent-world-model | | ⚓️ RL Infra | ⚓️ meta-pytorch/OpenEnv | | 🛜 Live Demo | 🤗 HuggingFace Space | | 📦 AgentWorldModel-1K | 🤗 Snowflake/AgentWorldModel-1K | | 🤖 Arctic-AWM-4B | 🤗 Snowflake/Arctic-AWM-4B | | 🤖 Arctic-AWM-8B | 🤗 Snowflake/Arctic-AWM-8B | | 🤖 Arctic-AWM-14B | 🤗 Snowflake/Arctic-AWM-14B |
If you want to directly use our synthesized environments, please download by
hf download Snowflake/AgentWorldModel-1K --repo-type dataset --local-dir ./outputs/
Then you can skip to [Environment Management](#environment-management) and [Agent Demo](#agent-demo) to start using the environments locally.
📦 Setup
Run uv sync to setup the python environment. And set your LLM API credentials:
# OpenAI or any other compatible services export AWM_SYN_LLM_PROVIDER="openai" export OPENAI_API_KEY="your-api-key" # optional, if you are using a custom base url export OPENAI_BASE_URL="http://xxxxxx" # Azure OpenAI export AWM_SYN_LLM_PROVIDER="azure" export AZURE_ENDPOINT_URL="https://your-endpoint.openai.azure.com/" export AZURE_OPENAI_API_KEY="your-api-key" # configure the model/LLM for synthesis export AWM_SYN_OVERRIDE_MODEL="your-model-name such as gpt-5"
🔥 Synthesis
AWM CLI
All synthesis is exposed through the awm command-line tool. Run awm --help to see available commands:
awm --help Available commands: gen Synthesis pipeline commands ├── scenario Generate scenario names from seed set ├── task Generate user tasks per scenario ├── db Generate database schema and create SQLite databases ├── sample Generate and insert sample data into databases ├── spec Generate API specification for each scenario ├── env Generate MCP environment code ├── verifier Generate verification code for tasks └── all Run the full synthesis pipeline env Environment management commands ├── start Start MCP server for a scenario ├── check Check if an MCP server is running and list its tools ├── check_all Check all generated environments └── reset_db Reset databases to initial state agent Run a tool-use agent to solve a task by interacting with the environment verify Verify agent run outputs using code-augmented LLM-as-a-Judge or purely code-based Judge bench Run evaluation on mcp-adapted-bench including bfclv3, tau2, and mcp-universe
Use awm --help to see options for any command, e.g. awm gen task --help.
Step 1: Scenario Generation
We start with a seed set of scenarios and generate 1,000 unique scenario descriptions. Note that only the names are used as seeds; the descriptions are included in the seed file for ease of use.
export EMBEDDING_OPENAI_API_KEY="your-api-key for the embedding model" awm gen scenario \ --input_path outputs/seed_scenario.jsonl \ --output_path outputs/gen_scenario.jsonl \ --target_count 1000
Step 2: Task Generation
We generate 10 tasks per scenario, which are also serving as the requirements for building the environment.
awm gen task \ --input outputs/gen_scenario.jsonl \ --output outputs/gen_tasks.jsonl
Step 3: Database Synthesis
We define the database schema and complete the initial state to fully support the generated tasks.
# database schema awm gen db \ --input outputs/gen_tasks.jsonl \ --output outputs/gen_db.jsonl # sample data for initial state awm gen sample \ --input_task outputs/gen_tasks.jsonl \ --input_db outputs/gen_db.jsonl \ --output outputs/gen_sample.jsonl
Step 4: Interface Synthesis
We first generate API spec for better generating the Python code of the environment in MCP interface.
# API spec (interface schema) awm gen spec \ --input_task outputs/gen_tasks.jsonl \ --input_db outputs/gen_db.jsonl \ --output outputs/gen_spec.jsonl # Environment code awm gen env \ --input_spec outputs/gen_spec.jsonl \ --input_db outputs/gen_db.jsonl \ --output outputs/gen_envs.jsonl
Step 5: Verification Synthesis
We provide two options for verification: 1. code-augmented LLM-as-a-Judge (sql) 2. purely code-based Judge (code)
awm gen verifier \ --mode sql \ --input_task outputs/gen_tasks.jsonl \…
Excerpt shown — open the source for the full document.
Notability
notability 5.0/10New research repo with moderate traction.