ByteDance-Seed/DAComp
Python
Captured source
source ↗ByteDance-Seed/DAComp
Description: [ICLR 2026] DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle
Language: Python
License: NOASSERTION
Stars: 425
Forks: 3
Open issues: 5
Created: 2025-12-08T10:08:00Z
Pushed: 2026-01-04T10:20:11Z
Default branch: main
Fork: no
Archived: no
README:
📰 News
- 2025-12-08: 🔥 We release the DAComp dataset and the paper.
👋 Overview
DAComp offers a research-grade benchmark spanning full data intelligence workflows: repository-level data engineering (DAComp-DE), open-ended data analysis (DAComp-DA), a Chinese-localized split (DAComp-zh), and accompanying baseline agents with evaluation suites curated in this repository.
🔍 Installation
Set up the environment using the following commands:
conda create -n dacomp python=3.12 conda activate dacomp pip install -r requirements.txt pip install openhands-ai conda install -c conda-forge nodejs conda install -c conda-forge poetry
🚀 Quick access DAComp Dataset
DAComp consists of two subsets: DA (Analysis) and DE (Engineering). You can download the dataset from DAComp.
Please use the provided scripts in [dacomp-da/download.py](./dacomp-da/README.md) and [dacomp-de/download.py](./dacomp-de/README.md) to download the data automatically.
# --- Download DAComp-DA Dataset --- cd dacomp-da # Download DAComp-DA dataset,English tasks into `dacomp-da/tasks` and Chinese tasks into `dacomp-da/tasks_zh`. Change repo_id and download_dir in download.py. python download.py # --- Download DAComp-DE Dataset --- cd dacomp-de # Download DAComp-DE dataset,English tasks into `dacomp-de/tasks` and Chinese tasks into `dacomp-de/tasks_zh`. Change repo_id and download_dir in download.py. python download.py
🚀 Quickstart
DAComp-DA
- Agents: pick
methods/da-agent(three-stage baseline),methods/spider-agent(single, image-first baseline), or OpenHands; fill in your model config, install requirements, and runrun.pyas shown in each agent [README](./methods/README.md).
DAComp-DE
- Agents: pick
methods/de-agent(OpenHands integration); fill in your model config, install requirements, as shown in [README](./methods/de-agent/README.md).
⚖️ Evaluation
DAComp-DA
- Standard DAComp-DA Tasks: follow [dacomp-da/evaluation_suite/README.md](./dacomp-da/evaluation_suite/README.md) to evaluate DA tasks.
- Results: export a run to
dacomp-da/evaluation_suite/agent_resultswithget_results.pyfrom the agent folder.
DAComp-DE
- Standard DAComp-DE Tasks: follow [dacomp-de/evaluation_suite/README.md](./dacomp-de/evaluation_suite/README.md) to evaluate DE-Impl and DE-Evol tasks.
- DE-Arch Unified Evaluator: follow [dacomp-de/evaluation_suite_arch/README.md](./dacomp-de/evaluation_suite_arch/README.md) to evaluate DE-Arch tasks.
📋 Leaderboard Submission
To submit your agent results to the leaderboard, please follow the instructions in DAComp Submission Guidelines.
🙇♂️ Acknowledgement
We thank the OpenHands team for their valuable contributions to the open-source community.
✍️ Citation
If you find our work helpful, please cite as
@misc{lei2025dacomp,
title={DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle},
author={Fangyu Lei and Jinxiang Meng and Yiming Huang and Junjie Zhao and Yitong Zhang and Jianwen Luo and Xin Zou and Ruiyi Yang and Wenbo Shi and Yan Gao and Shizhu He and Zuo Wang and Qian Liu and Yang Wang and Ke Wang and Jun Zhao and Kang Liu},
year={2025},
eprint={2512.04324},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.04324},
}🌱 About ByteDance Seed Team
Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry's most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.
You can get to know us better through the following channels👇
Notability
notability 5.0/10New repo by ByteDance with moderate stars.
ByteDance (Doubao/Seed) has a repo signal matching data demand, evals and quality.