ModelXiaomi (MiMo)Xiaomi (MiMo)published Nov 19, 2025seen 5d

XiaomiMiMo/MiMo-Embodied-7B

Open original ↗

Captured source

source ↗
published Nov 19, 2025seen 5dcaptured 9hhttp 200method plaintask image-text-to-textlicense mitlibrary transformersparams 8.3Bdownloads 1.1klikes 66

I. Introduction

MiMo-Embodied, a powerful cross-embodied vision-language model that shows state-of-the-art performance in both autonomous driving and embodied AI tasks, the first open-source VLM that integrates these two critical areas, significantly enhancing understanding and reasoning in dynamic physical environments.

II. Model Capabilities

III. Model Details

IV. Evaluation Results

MiMo-Embodied demonstrates superior performance across 17 benchmarks in three key embodied AI capabilities: Task Planning, Affordance Prediction, and Spatial Understanding, significantly surpassing existing open-source embodied VLM models and rivaling closed-source models.

Additionally, MiMo-Embodied excels in 12 autonomous driving benchmarks across three key capabilities: Environmental Perception, Status Prediction, and Driving Planning—significantly outperforming both existing open-source and closed-source VLM models, as well as proprietary VLM models.

Moreover, evaluation on 8 general visual understanding benchmarks confirms that MiMo-Embodied retains and even strengthens its general capabilities, showing that domain-specialized training enhances rather than diminishes overall model proficiency.

Embodied AI Benchmarks

Affordance & Planning

Spatial Understanding

Autonomous Driving Benchmarks

Single-View Image & Multi-View Video

Multi-View Image & Single-View Video

General Visual Understanding Benchmarks

> Results marked with \* are obtained using our evaluation framework.

V. Case Visualization

Embodied AI

Affordance Prediction

Task Planning

Spatial Understanding

Autonomous Driving

Environmental Perception

Status Prediction

Driving Planning

Real-world Tasks

Embodied Navigation

Embodied Manipulation

VI. Citation

@misc{hao2025mimoembodiedxembodiedfoundationmodel,
title={MiMo-Embodied: X-Embodied Foundation Model Technical Report},
author={Xiaomi Embodied Intelligence Team},
year={2025},
eprint={2511.16518},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2511.16518},
}

Notability

notability 5.0/10

New embodied model release, moderate traction.