RepoInclusionAI (Ant Group)InclusionAI (Ant Group)published Jul 1, 2025seen 5d

inclusionAI/AWorld-RL

Python

Open original ↗

Captured source

source ↗
published Jul 1, 2025seen 5dcaptured 9hhttp 200method plain

inclusionAI/AWorld-RL

Description: Agentic Learning Powered by AWorld

Language: Python

License: MIT

Stars: 110

Forks: 10

Open issues: 2

Created: 2025-07-01T07:52:11Z

Pushed: 2026-04-16T03:28:08Z

Default branch: main

Fork: no

Archived: no

README:

arXiv(HardGen) arXiv(V2P) | arXiv(RAG-R1) | arXiv(FunReason) | arXiv(EnvTuning)| arXiv(FunReason-MT)|

EnvTuning

📣 News

[2026/04/06] 🎉🎉🎉**FunReason (BalanceSFT)** was accepted as a finding paper of ACL 2026 conference!

[2026/01/26] 🎉🎉🎉**Environment Tuning** was accepted at ICLR 2026 conference!

[2026/01/04] 🔥🔥🔥[HardGen](./FunReason-MT) We propose HadrGen, an extension of the FunReason-MT.

[2025/10/29] 🔥🔥🔥[FunReason-MT](./FunReason-MT) We propose FunReason-MT, a novel data synthesis framework designed to address critical bottlenecks in multi-turn Function Calling (FC) data generation, achieving excellent performance in complex agentic tasks.

[2025/10/22] 🔥🔥🔥[EnvTuning](./EnvTuning) We propose Environment Tuning, a novel training paradigm that enables agents to learn complex multi-turn tool use behaviors through environmental interaction rather than trajectory imitation, achieving significant improvements with only 400 training samples.(ICLR2026 Accepted)

[2025/08/19] 🔥🔥🔥[V2P](./V2P) We propose V2P, a novel training method for multi-modal models that enables coordinate-free, human-like visual GUI Grounding.

[2025/07/01] 🔥🔥🔥[RAG-R1](./RAG-R1) We propose RAG-R1, a deepsearch training framework that incentivizing the search and reasoning capabilities of LLMs through multi-query parallelism.(AAAI2026 Accepted)

[2025/05/16] 🔥🔥🔥**FunReason** We propose FunReason, a novel framework that enhances LLMs' function calling capabilities through an automated data refinement strategy and a Self-Refinement Multiscale Loss approach.(ACL2026 Accepted)

📖 Introduction

AWorld-RL is a comprehensive collection of cutting-edge agentic reinforcement learning algorithms developed by the AWorld Team. Built upon the AWorld Framework, this repository provides complete codebases, datasets, and checkpoints for training and evaluating autonomous agents that learn through multi-turn interactions with dynamic environments.

Our work focuses on enabling agents to effectively leverage environmental feedback for complex problem-solving across diverse domains, including multi-modal understanding, deep search, and function calling.

![AgenticLearning Framework](assets/framework.png "AgenticLearning Framework")

🚀 Projects

[From Failure to Mastery: Generating Hard Samples for Tool-use Agents](./FunReason-MT)

Authors: Bingguang Hao, Zengzhuang Xu, Yuntao Wen, Xinyi Xu, Yang Liu et al.

[FunReason-MT Technical Report: Advanced Data Synthesis Solution for Real-world Multi-Turn Tool-use](./FunReason-MT)

Authors: Zengzhuang Xu, Bingguang Hao, Zechuan Wang et al.

[Don't Just Fine-tune the Agent, Tune the Environment](./EnvTuning)

Authors: Siyuan Lu, Zechuan Wang, Hongxuan Zhang, Qintong Wu, Leilei Gan, Chenyi Zhuang, Jinjie Gu, Tao Lin (ICLR 2026)

[V2P: From Background Suppression to Center Peaking for Robust GUI Grounding](./V2P)

Authors: Jikai Chen, Long Chen, Dong Wang, Leilei Gan, Chenyi Zhuang, Jinjie Gu

[RAG-R1: Incentivizing the Search and Reasoning Capabilities of LLMs Through Multi-query Parallelism](./RAG-R1)

Authors: Zhiwen Tan, Jiaming Huang, Qintong Wu, Hongxuan Zhang, Chenyi Zhuang, Jinjie Gu (AAAI 2026)

[FunReason: Enhancing Large Language Models' Function Calling via Self-Refinement Multiscale Loss and Automated Data Refinement](https://github.com/BingguangHao/FunReason/)

Authors: Bingguang Hao, Maolin Wang, Zengzhuang Xu, Cunyin Peng, Yicheng Chen, Xiangyu Zhao, Jinjie Gu, Chenyi Zhuang (ACL 2026)

📚 Overview

Table of Contents

  • [Multi-Modal](#multi-modal)
  • [V2P](#v2p)
  • [Deepsearch](#deepsearch)
  • [RAG-R1](#rag-r1)
  • [Tool Use](#tool-use)
  • [FunReason-MT](#funreason-mt)
  • [Environment Tuning](#environment-tuning)
  • [FunReason](#funreason)

Multi-Modal

[V2P](./V2P)

  • Tools: PyAutoGUI Tools
  • LLM: Qwen2.5-7b-instruct

Deepsearch

[RAG-R1](./RAG-R1)

  • Tools: Search Engines (offline or online)
  • LLM: Qwen2.5-7b-instruct

Tool Use

[FunReason-MT](./FunReason-MT)

  • Tools: Multi-turn Tool Use (BFCLv3 Benchmark)
  • LLM: Qwen3-4b-Instruct-2507

##### Key Highlights:

  • State-of-the-Art Performance: A 4B model trained on FunReason-MT data achieves state-of-the-art results among similarly sized models on the Berkeley Function-Calling Leaderboard (BFCLv3) Multi-Turn benchmark.
  • Closed-Source Model Outperformance: The FunReason-MT RL-trained 4B model surpasses most leading closed-source models (e.g., GPT-5, Gemini-2.5-Pro, Claude-Sonnet-4) and open-source models (e.g., DeepSeek-R1) in Multi-Turn evaluation.
  • Robust Framework: The solution addresses three structural deficiencies in data generation: Targeted Model Training, Isolation of Tool Architecture, and Multi-Turn Logical Dependency.
  • Agentic Generalization: The model demonstrates promising out-of-distribution generalization and improved agentic capability on the BFCLv4 benchmark (Web Search and Memory tasks).

---

##### 🔬 Methodology: The FunReason-MT Framework

The framework tackles complexity and reliability challenges by breaking the data generation process into three core phases:

| Phase | Core Component | Challenge Addressed | Description | | :------------------ | :------------------------------------------------------------------- | :----------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Phase I | Environment-API Graph Interactions | Targeted Model Training | Samples tool calls using a Directed Sampler to efficiently collect…

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

New repo with moderate stars, solid research release.