ByteDance-Seed/SpatialTree
Shell
Captured source
source ↗ByteDance-Seed/SpatialTree
Description: CVPR 2026 (Highlight); Spatial Intelligence; MLLMs
Language: Shell
Stars: 45
Forks: 0
Open issues: 1
Created: 2026-01-04T07:04:01Z
Pushed: 2026-02-24T01:34:37Z
Default branch: main
Fork: no
Archived: no
README:
SpatialTree: How Spatial Abilities Branch Out in MLLMs
SpatialTree is a cognitive-science-inspired hierarchy and benchmark for evaluating spatial abilities in Large Multimodal Models (MLLMs). It organizes spatial capability into four levels—perception (L1), mental mapping (L2), simulation (L3), and agentic competence (L4)—spanning 27 sub-abilities.
🌳 SpatialTree Hierarchy
Distinct from previous benchmarks, our hierarchy moves from basic perception to complex agentic interactions:
💡 Key Findings
- Hierarchy Matters: Higher-level skills (L2-L4) are strongly correlated, while L1 skills are largely orthogonal.
- Transfer Dynamics: Strong cross-level transfer exists from low-level to high-level abilities, whereas transfer within L1 can be negative.
- Auto-Think: Naive chain-of-thought/reasoning helps complex tasks but hurts intuitive perception. We propose an auto-think strategy that suppresses unnecessary deliberation, enabling RL to consistently improve performance across levels.
📊 Evaluation
We provide evaluation scripts integrated with lmms-eval to test various Large Multimodal Models (LMMs) on SpatialTreeBench.
Installation
First, install lmms-eval and the necessary dependencies:
git clone https://github.com/EvolvingLMMs-Lab/lmms-eval.git cd lmms-eval pip install -e .
How to Run
To evaluate a model on SpatialTreeBench via an OpenAI-compatible interface, use the following commands. The task is registered as spatialtreebench.
Evaluate GPT-4o (via OpenAI API)
Configure the judge model settings for LLM-as-a-Judge:
export MODEL_VERSION="gpt-4o-2024-05-13" export API_TYPE= # openai, azure, async_openai, or async_azure export OPENAI_API_KEY=xxxx export OPENAI_API_URL= # Default: https://api.openai.com/v1/chat/completions/v1
You can configure these environment variables directly or modify them in the script. Then, run the evaluation:
bash examples/models/openai_compatible.sh
Metrics & Results
After the evaluation finishes, the results will be saved in the directory specified by --output_path.
The hierarchical scores will also be displayed in the terminal as follows:
--- SpaTreeBench Hierarchical Scores --- SpaTree ├── L1 ├── ├── Geometry ├── ├── ├── Distance ├── ├── ├── Shape ├── ├── └── Size ├── ├── Localization ├── ├── ├── 3D Detection ├── ├── └── 3D Grounding ├── ├── Motion ├── ├── ├── Allo ├── ├── └── Ego ├── ├── Orientation ├── ├── ├── Gravity ├── ├── └── Object Orientation ├── └── Relation ├── └── ├── Correspondence ├── └── └── Relative Direction ├── L2 ├── ├── Memory ├── ├── ├── Cognitive Map ├── ├── └── Memory Retrieval ├── └── Underst. ├── └── ├── Affordance ├── └── ├── Motion Understanding ├── └── ├── Perspective Taking ├── └── ├── Relation Understanding ├── └── └── Spatial Caption ├── L3 ├── ├── Caus. Reas. ├── ├── ├── Dynamics ├── ├── ├── Geometry Puzzles ├── ├── └── Relation ├── └── Seq. Plan. ├── └── ├── Operation ├── └── └── Route └── L4 └── ├── Goal-Driven Execution └── ├── ├── Agentic Navigation └── ├── └── Robotic Arm └── └── Open-world Exploration └── └── ├── Knowledge Acquisition └── └── └── Self-Goaling ----------------------------------------
You can check results.json for aggregated scores and samples.json for detailed model inputs and predictions.
Dataset Loading
The script automatically handles dataset downloading from Hugging Face: SpatialTree-Bench.
If you need to run the evaluation offline, please download the dataset beforehand and modify the dataset_path in lmms_eval/tasks/spatialtreebench/spatialtreebench.yaml.
---
🖊️ Citation
If you find this work helpful, please cite our paper:
@article{xiao2025spatialtree,
title={SpatialTree: How Spatial Abilities Branch Out in MLLMs},
author={Xiao, Yuxi and Li, Longfei and Yan, Shen and Liu, Xinhang and Peng, Sida and Wei, Yunchao and Zhou, Xiaowei and Kang, Bingyi},
journal={arXiv preprint arXiv:2512.20617},
year={2025}
}Notability
notability 3.0/10New repo, low stars, not notable yet