RepoXiaomi (MiMo)Xiaomi (MiMo)published Nov 19, 2025seen 5d

XiaomiMiMo/MiMo-Embodied

Python

Open original ↗

Captured source

source ↗
published Nov 19, 2025seen 5dcaptured 14hhttp 200method plain

XiaomiMiMo/MiMo-Embodied

Description: MiMo-Embodied

Language: Python

License: NOASSERTION

Stars: 386

Forks: 15

Open issues: 0

Created: 2025-11-19T08:54:41Z

Pushed: 2026-04-15T12:28:08Z

Default branch: main

Fork: no

Archived: no

README:

I. Introduction

This repository provides the official evaluation suite of MiMo-Embodied, designed to support rigorous and reproducible evaluation for embodied AI and autonomous driving tasks.

Built on top of the excellent lmms-eval framework, this repository extends the evaluation pipeline with MiMo-specific model integration, benchmark support, and evaluation workflows for embodied and driving scenarios.

MiMo-Embodied is a powerful cross-embodied vision-language model that demonstrates state-of-the-art performance in both autonomous driving and embodied AI tasks, representing the first open-source VLM that integrates these two critical areas.

> This repository is for evaluation only. It does not contain model training code.

---

II. Key Features

1. MiVLLM: A MiMo-tailored vLLM-based Model Wrapper

We use a custom mivllm model class built on top of the original VLLM implementation in lmms-eval, tailored for MiMo models. Compared with the default implementation, it:

  • improves data loading efficiency
  • enables finer control over image and video preprocessing
  • supports MiMo-specific inference settings such as:
  • max_model_len
  • gpu_memory_utilization
  • max_num_seqs

2. Evaluation for Embodied AI

This evaluation suite supports embodied AI benchmarks covering key capabilities such as:

  • affordance prediction
  • task planning
  • spatial understanding

3. Evaluation for Autonomous Driving

This evaluation suite also supports autonomous driving benchmarks covering key capabilities such as:

  • environmental perception
  • status prediction
  • driving planning
  • driving knowledge-based QA

4. Flexible Evaluation Workflows

The framework supports:

  • single-GPU evaluation
  • multi-GPU evaluation
  • multi-node distributed evaluation
  • batch evaluation across multiple tasks

---

III. Benchmark Coverage

This repository focuses on the evaluation of embodied AI and autonomous driving tasks.

Embodied AI Benchmarks

| Category | Benchmarks | |---|---| | Affordance & Planning | Where2Place (where2place_point), RoboAfford-Eval (roboafford), Part-Afford (part_affordance), RoboRefIt (roborefit), VABench-Point (vabench_point_box) | | Planning | EgoPlan2 (egoplan), RoboVQA (robovqa), Cosmos (cosmos_reason1_boxed) | | Spatial Understanding | CV-Bench (cvbench_boxed), ERQA (erqa_boxed), EmbSpatial (embspatialbench), SAT (sat), RoboSpatial (robospatial), RefSpatial (refspatialbench), CRPE (crpe_relation), MetaVQA (metavqa_eval), VSI-Bench (vsibench_boxed) |

Autonomous Driving Benchmarks

| Benchmarks | |---| | CODA-LM (codalm) | | Drama (drama) | | DriveAction (drive_action_boxed_detail) | | LingoQA (lingoqa_boxed) | | nuScenes-QA (nuscenesqa) | | OmniDrive (omnidrive) | | NuInstruct (nuinstruct) | | DriveLM (drivelm) | | MAPLM (maplm) | | BDD-X (bddx) | | MME-RealWorld (mme_realworld) | | IDKB (idkb) |

> A more detailed task list can be maintained in mimovl_docs/tasks.md.

---

IV. Usage

Installation

# Step 1: Create conda environment
conda create -n lmms-eval python=3.10 -y
conda activate lmms-eval

# Step 2: Install PyTorch (adjust CUDA version as needed)
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124

# Step 3: Install vLLM
pip install vllm==0.7.3

# Step 4: Install the evaluation framework
git clone https://github.com/XiaomiMiMo/MiMo-Embodied.git
cd MiMo-Embodied
pip install -e . && pip uninstall -y opencv-python-headless
pip install -r requirements.txt

# Step 5 (optional but recommended)
pip install xformers==0.0.28.post3

Dataset Paths

For many benchmarks, images are already packaged in the corresponding Hugging Face dataset, so no additional local path configuration is required.

For some benchmarks with large image/video assets, the released config YAML uses a placeholder local path such as:

img_root: "/path/to/your/image_or_video_data"

Before running evaluation for these benchmarks, please manually update img_root in the corresponding task YAML file to point to your local image/video directory.

For example:

dataset_path: Zray26/bdd_x_testing_caption
task: "bddx"
test_split: test
dataset_kwargs:
token: True

output_type: generate_until
img_root: "/path/to/your/image_or_video_data"
doc_to_visual: !function utils.doc_to_visual
doc_to_text: !function utils.doc_to_text
doc_to_target: !function utils.doc_to_target
process_results: !function utils.process_test_results_for_submission

A typical task folder is organized as:

lmms_eval/tasks//
├── .yaml
└── utils.py

For example:

lmms_eval/tasks/bddx/
├── bddx.yaml
└── utils.py

Please check the YAML file of each benchmark case by case and fill in img_root when local image/video assets are required.

Main Evaluation Script

The main evaluation launcher is:

bash mimovl_docs/eval_mimo_vl_args.sh [disable_thinking]

Single-Task Evaluation

bash mimovl_docs/eval_mimo_vl_args.sh \
XiaomiMiMo/MiMo-Embodied-7B \
cvbench_boxed \
./eval_results

No-Think Evaluation

For tasks evaluated in no-think mode, run:

bash mimovl_docs/eval_mimo_vl_args.sh \
XiaomiMiMo/MiMo-Embodied-7B \
\
./eval_results \
true

This corresponds to:

disable_thinking_user=true

Multi-GPU / Multi-Node Evaluation

The launcher supports distributed evaluation through environment variables:

export NNODES=1
export NODE_RANK=0
export MASTER_ADDR=127.0.0.1
export MASTER_PORT=29500
export NPROC_PER_NODE=8

Then run:

bash mimovl_docs/eval_mimo_vl_args.sh \
\
\

Batch Evaluation

To run multiple tasks sequentially, edit the task list in:

tools/submit/batch_run.py

Then launch:

python tools/submit/batch_run.py \
--input \
--eval_results_dir

To disable thinking mode in batch evaluation:

python tools/submit/batch_run.py \
--input \
--eval_results_dir \…

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

New Xiaomi embodied AI repo; 386 stars.