OpenBMB/MoRE
Python
Captured source
source ↗OpenBMB/MoRE
Description: [SIGIR '26] Mixture-of-Retrieval Experts for Reasoning-Guided Multimodal Knowledge Exploitation
Language: Python
Stars: 43
Forks: 2
Open issues: 0
Created: 2025-05-22T11:51:31Z
Pushed: 2026-05-15T02:41:26Z
Default branch: main
Fork: no
Archived: no
README:
News
2026.04.03 Our work is accepted by SIGIR 2026 🎉🎉🎉!
2025.08.22 We upload MoRE-3B.
Environment
For training, answer generation, and evaluation processes:
conda create -n router python=3.11 conda activate router pip install requirements_router.txt
For retriever and corpus construction processes:
conda create -n retriever python=3.11 conda activate retriever pip install requirements_retriever.txt
Corpora Construction
For the text corpus, you can download enwiki-20241020 from Huggingface. Then preprocess, and index it with the following commands:
7z x enwiki-20241020-pages-articles-multistream.xml.zip.001 conda activate retriever wikiextractor enwiki-20241020-pages-articles-multistream.xml.bz2 -o wiki_extracted python wiki_preprocess.py
For the image corpus, you can directly download M-BEIR. To embed and index it you can follow the repository
For the table corpus, you can download, embed, and index Open-WikiTable following the repository, or you can download directly the one we have already preprocessed from here.
Retrievers Preparation
For the Text-Image Retriever, you can directly download UniIR
For the Table Retriever, you can train it with the help of repository, or you can download it directly from here.
Datasets
We have prepared all the text datasets in ./datasets, for images you need to download them from:
InfoSeek:InfoSeek images can be downloaded from OVENDyn-VQA:Dynamic VQA images can be downloaded from DynVQA_en.202412WebQA:WebQA images can be downloaded from Google Drive
Training
If you do not want to train the model, you can download MoRE and skip this section to [Evaluation](#evaluation)
Data Synthesis
If you want to use the ready-to-use synthetic data directly, you can skip this section to [Step-GRPO Training](#step-grpo-training)
First, we need to synthesize the data step by step:
bash src/data_synthesis/data_synthesis.sh
Step-GRPO Training
Our training framework is based on EasyR1. All you need to do is download it and replace some files with the files in ./Easy-R1. Then start training with the command:
conda activate router bash examples/run_qwen2_5_vl_7b_stepgrpo.sh
Evaluation
We provide the evaluation pipeline for the R1-Router:
bash evaluation.sh
or, you can just evaluate the results we have provided by:
conda activate router cd src python evaluate.py --dataset_name all --method "r1-router3"
Acknowledgement
Our work is built on the following codebases, and we are deeply grateful for their contributions.
Citation
We appreciate your citations if you find our paper relevant and useful to your research!
@article{peng2025mixture,
title={Mixture-of-Retrieval Experts for Reasoning-Guided Multimodal Knowledge Exploitation},
author={Peng, Chunyi and Xu, Zhipeng and Liu, Zhenghao and Li, Yishan and Yan, Yukun and Wang, Shuo and Gu, Yu and Yu, Minghe and Yu, Ge and Sun, Maosong},
year={2025}
url={https://arxiv.org/abs/2505.22095},
}Contact Us
If you have questions, suggestions, or bug reports, please email us. We will try our best to help you.
hm.cypeng@gmail.com
Notability
notability 3.0/10New repo, low stars, from notable lab.