RepoSnowflake (Arctic)Snowflake (Arctic)published Feb 8, 2026seen 5d

Snowflake-Labs/agent-world-model

Python

Open original ↗

Captured source

source ↗

Snowflake-Labs/agent-world-model

Description: Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning

Language: Python

Stars: 375

Forks: 42

Open issues: 1

Created: 2026-02-08T03:02:35Z

Pushed: 2026-05-28T21:36:00Z

Default branch: main

Fork: no

Archived: no

README: Agent World Model

Infinity Synthetic Environments for Agentic Reinforcement Learning

Zhaoyang Wang1, Canwen Xu2, Boyi Liu2, Yite Wang2, Siwei Han1,

Zhewei Yao2, Huaxiu Yao1, Yuxiong He2

1UNC-Chapel Hill 2Snowflake AI Research

Agent World Model (AWM) is a fully synthetic environment generation pipeline that synthesizes 1,000 executable, SQL database-backed tool-use environments exposed via unified MCP interface for large-scale multi-turn agentic reinforcement learning.

---

📣 News

  • May 1, 2026: AWM got accepted to ICML 2026 🎉, and its infra got merged into meta-pytorch/OpenEnv supporting large-scale agentic RL training! Have a live demo try at huggingface space 🤗!
  • Mar 16, 2026: we added the verification demo, please refer to [Verification](#verification) section!
  • Feb 10, 2026: we open-sourced the synthesis pipeline, 1,000 synthesized environments and RL trained agents at Huggingface!

🔮 Resources

We released the syntheszied 1,000 executable environments and corresponding tasks, databases, and verification in huggingface. Please checkout huggingface repo at Snowflake/AgentWorldModel-1K. You can freely interact with these environments online at HuggingFace Space.

| Resource | Link | |----------|------| | 📄 Paper | 📄 arxiv.org/abs/2602.10090 | | 💻 Code | 💻 Snowflake-Labs/agent-world-model | | ⚓️ RL Infra | ⚓️ meta-pytorch/OpenEnv | | 🛜 Live Demo | 🤗 HuggingFace Space | | 📦 AgentWorldModel-1K | 🤗 Snowflake/AgentWorldModel-1K | | 🤖 Arctic-AWM-4B | 🤗 Snowflake/Arctic-AWM-4B | | 🤖 Arctic-AWM-8B | 🤗 Snowflake/Arctic-AWM-8B | | 🤖 Arctic-AWM-14B | 🤗 Snowflake/Arctic-AWM-14B |

If you want to directly use our synthesized environments, please download by

hf download Snowflake/AgentWorldModel-1K --repo-type dataset --local-dir ./outputs/

Then you can skip to [Environment Management](#environment-management) and [Agent Demo](#agent-demo) to start using the environments locally.

📦 Setup

Run uv sync to setup the python environment. And set your LLM API credentials:

# OpenAI or any other compatible services
export AWM_SYN_LLM_PROVIDER="openai"
export OPENAI_API_KEY="your-api-key"
# optional, if you are using a custom base url
export OPENAI_BASE_URL="http://xxxxxx"

# Azure OpenAI
export AWM_SYN_LLM_PROVIDER="azure"
export AZURE_ENDPOINT_URL="https://your-endpoint.openai.azure.com/"
export AZURE_OPENAI_API_KEY="your-api-key"

# configure the model/LLM for synthesis
export AWM_SYN_OVERRIDE_MODEL="your-model-name such as gpt-5"

🔥 Synthesis

AWM CLI

All synthesis is exposed through the awm command-line tool. Run awm --help to see available commands:

awm --help

Available commands:
gen Synthesis pipeline commands
├── scenario Generate scenario names from seed set
├── task Generate user tasks per scenario
├── db Generate database schema and create SQLite databases
├── sample Generate and insert sample data into databases
├── spec Generate API specification for each scenario
├── env Generate MCP environment code
├── verifier Generate verification code for tasks
└── all Run the full synthesis pipeline
env Environment management commands
├── start Start MCP server for a scenario
├── check Check if an MCP server is running and list its tools
├── check_all Check all generated environments
└── reset_db Reset databases to initial state
agent Run a tool-use agent to solve a task by interacting with the environment
verify Verify agent run outputs using code-augmented LLM-as-a-Judge or purely code-based Judge
bench Run evaluation on mcp-adapted-bench including bfclv3, tau2, and mcp-universe

Use awm --help to see options for any command, e.g. awm gen task --help.

Step 1: Scenario Generation

We start with a seed set of scenarios and generate 1,000 unique scenario descriptions. Note that only the names are used as seeds; the descriptions are included in the seed file for ease of use.

export EMBEDDING_OPENAI_API_KEY="your-api-key for the embedding model"

awm gen scenario \
--input_path outputs/seed_scenario.jsonl \
--output_path outputs/gen_scenario.jsonl \
--target_count 1000

Step 2: Task Generation

We generate 10 tasks per scenario, which are also serving as the requirements for building the environment.

awm gen task \
--input outputs/gen_scenario.jsonl \
--output outputs/gen_tasks.jsonl

Step 3: Database Synthesis

We define the database schema and complete the initial state to fully support the generated tasks.

# database schema
awm gen db \
--input outputs/gen_tasks.jsonl \
--output outputs/gen_db.jsonl

# sample data for initial state
awm gen sample \
--input_task outputs/gen_tasks.jsonl \
--input_db outputs/gen_db.jsonl \
--output outputs/gen_sample.jsonl

Step 4: Interface Synthesis

We first generate API spec for better generating the Python code of the environment in MCP interface.

# API spec (interface schema)
awm gen spec \
--input_task outputs/gen_tasks.jsonl \
--input_db outputs/gen_db.jsonl \
--output outputs/gen_spec.jsonl

# Environment code
awm gen env \
--input_spec outputs/gen_spec.jsonl \
--input_db outputs/gen_db.jsonl \
--output outputs/gen_envs.jsonl

Step 5: Verification Synthesis

We provide two options for verification: 1. code-augmented LLM-as-a-Judge (sql) 2. purely code-based Judge (code)

awm gen verifier \
--mode sql \
--input_task outputs/gen_tasks.jsonl \…

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

New research repo with moderate traction.