RepoByteDance (Doubao/Seed)ByteDance (Doubao/Seed)published May 11, 2025seen 5d

ByteDance-Seed/Seed1.5-VL

Jupyter Notebook

Open original ↗

Captured source

source ↗
published May 11, 2025seen 5dcaptured 12hhttp 200method plain

ByteDance-Seed/Seed1.5-VL

Description: Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Language: Jupyter Notebook

License: Apache-2.0

Stars: 1580

Forks: 66

Open issues: 25

Created: 2025-05-11T06:00:32Z

Pushed: 2025-06-14T19:58:52Z

Default branch: main

Fork: no

Archived: no

README:

🤗 HuggingFace Demo&nbsp&nbsp | &nbsp&nbsp🌐 Homepage&nbsp&nbsp | &nbsp&nbsp📄 arXiv

Today, we are excited to introduce Seed1.5-VL 🚀, a powerful and efficient vision-language foundation model designed for advanced general-purpose multimodal understanding and reasoning.

🌟 Highlights

  • 🧠 Efficient Powerhouse: Achieves top performance with a relatively modest architecture, 532M vision encoder & 20B active parameter MoE LLM.
  • 🏆 Exceptional Benchmark Performance: Delivers State-of-the-Art results on 38 out of 60 public VLM benchmarks, demonstrating broad competence.
  • 💡 Versatile Capabilities: Excels across diverse capabilities including complex reasoning (e.g., visual puzzles like Rebus), OCR, diagram understanding, visual grounding, 3D spatial understanding, and video comprehension.
  • 🤖 Advanced Agent-Centric Abilities: Demonstrates leading performance in interactive agent tasks, showcasing strong capabilities in GUI control and gameplay.

This repository offers usage cookbook and best practices designed to help developers effectively use Seed1.5-VL.

📢 News

  • 2025-05-13: We have deployed our Seed1.5-VL on 🤗 HuggingFace Spaces, Welcome to try out our model!
  • 2025-05-12: We have released the [Seed1.5-VL Technical Report](./Seed1.5-VL-Technical-Report.pdf).
  • 2025-05-12: We are extremely delighted to release the flagship Seed1.5-VL on Volcano Engine. The Model ID is doubao-1-5-thinking-vision-pro-250428. You can try it now!

📮 Notice

Call for Bad Cases: If you have encountered any cases where the model performs poorly, we would greatly appreciate it if you could share them in the issue [https://github.com/ByteDance-Seed/Seed1.5-VL/issues/12]

📖 Seed1.5-VL Cookbook

The Seed1.5-VL cookbook is designed to help you start using the Seed1.5-VL API with diverse code samples. Our flagship Seed1.5-VL has been deployed on Volcano Engine. After obtaining your API_KEY, you can use the examples in this cookbook to rapidly understand and leverage the diverse capabilities of our Seed1.5-VL.

Quick Start

  • [x] Cookbook for online/offline [Gradio Demo](./GradioDemo)
  • [x] Cookbook for turning on/off [LongCoT](./longCoT)
  • [x] Cookbook for [2D Grounding](./Grounding)
  • [x] Cookbook for [3D Understanding](./3D-Understanding)
  • [x] Cookbook for [Video Understanding](./Video)
  • [X] Cookbook for [GUI Agents](./GUI)

Citations

If you Seed1.5-VL useful in your research or applications, please consider giving us a star 🌟 and citing it by the following BibTeX entry.

@article{seed2025seed1_5vl,
title={Seed1.5-VL Technical Report},
author={ByteDance Seed Team},
journal={arXiv preprint arXiv:2505.07062},
year={2025}
}

License

This repo is under [Apache-2.0 License](./LICENSE).

About ByteDance Seed Team

Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry's most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.

Excerpt shown — open the source for the full document.

Notability

notability 7.0/10

Notable VL model by major company, moderate stars