google-deepmind/physics-IQ-benchmark

Python

Open original ↗

Captured source

source ↗

google-deepmind/physics-IQ-benchmark

Description: Benchmarking physical understanding in generative video models

Language: Python

License: NOASSERTION

Stars: 302

Forks: 33

Open issues: 3

Created: 2024-12-14T05:07:19Z

Pushed: 2026-05-27T13:13:57Z

Default branch: main

Fork: no

Archived: no

README:

[Step A: Generating Videos](#step-a-generating-videos-for-physics-iq-test-cases-based-on-video-model) | [Step B: Evaluating Generated Videos](#step-b-evaluating-generated-videos-on-physics-iq-to-generate-benchmark-scores) | [Leaderboard](#leaderboard) | [Citation](#citation) | [License](#license-and-disclaimer)

Physics-IQ: Benchmarking physical understanding in generative video models

Physics-IQ is a high-quality, realistic, and comprehensive benchmark dataset for evaluating physical understanding in generative video models.

Project website: physics-iq.github.io

Key Features:

  • Real-world videos: All videos are captured with high-quality cameras, not rendered.
  • Diverse scenarios: Covers a wide range of physical phenomena, including collisions, fluid dynamics, gravity, material properties, light, shadows, magnetism, and more.
  • Multiple perspectives: Each scenario is filmed from 3 different angles.
  • Variations: Each scenario is recorded twice to capture natural physical variations.
  • High resolution and frame rate: Videos are recorded at 3840 × 2160 resolution and 30 frames per second.

---

Leaderboard

The best possible score on Physics-IQ is 100.0%, this score would be achieved by physically realistic videos that differ only in physical randomness but adhere to all tested principles of physics.

If you test your model on Physics-IQ and would like your score/paper/model to be featured here in this table, feel free to open a pull request that adds a row to the table and we'll be happy to include it!

| # | Model | input type | Physics-IQ score | date added (YYYY-MM-DD) | | -- | --- | --- | --- | --- | | 1 | Cosmos3-Super + WMReward (BoN) reported here | multiframe (v2v) | 63.4 % :1st_place_medal: v2v | 2026-05-26 | | 2 | Magi-1 + WMReward (BoN) reported here | multiframe (v2v) | 62.6 % :2nd_place_medal: v2v | 2025-10-28 | | 3 | Cosmos3-Super reported here | multiframe (v2v) | 59.7 % :3rd_place_medal: v2v | 2026-05-26 | | 4 | Cosmos3-Nano + WMReward (BoN) reported here | multiframe (v2v) | 57.7 % | 2026-05-26 | | 5 | Magi-1 reported here | multiframe (v2v) | 56.0 % | 2025-04-21 | | 6 | Cosmos3-Nano reported here | multiframe (v2v) | 50.2 % | 2026-05-26 | | 7 | Cosmos3-Super + WMReward (BoN) reported here | i2v | 48.9 % :1st_place_medal: i2v | 2026-05-26 | | 8 | Sora2 + WMReward (BoN) reported here | i2v | 46.4 % :2nd_place_medal: i2v | 2026-04-01 | | 9 | Wan2.2 + WMReward (BoN) reported here | i2v | 44.4 % :3rd_place_medal: i2v | 2026-04-01 | | 10 | Cosmos3-Super reported here | i2v | 43.8 % | 2026-05-26 | | 11 | Cosmos3-Nano + WMReward (BoN) reported here | i2v | 43.8 % | 2026-05-26 | | 12 | Sora2 reported here | i2v | 42.3 % | 2026-04-01 | | 13 | Cosmos3-Nano reported here | i2v | 40.2 % | 2026-05-26 | | 14 | Wan2.2 reported here | i2v | 38.3 % | 2026-04-01 | | 15 | Magi-1 + WMReward (BoN) reported here | i2v | 36.9 % | 2025-10-28 | | 16 | Video-GPT reported here | multiframe (v2v) | 35.0 % | 2025-05-22 | | 17 | CogVideoX-5b reported here | i2v | 32.3 % | 2026-01-06 | | 18 | Magi-1 reported here | i2v | 30.2 % | 2025-04-21 | | 19 | VideoPoet reported here | multiframe (v2v) | 29.5 % | 2025-02-19 | | 20 | Lumiere reported here | multiframe (v2v) | 23.0 % | 2025-02-19 | | 21 | Runway Gen 3 reported here | i2v | 22.8 % | 2025-02-19 | | 22 | VideoPoet reported here | i2v | 20.3 % | 2025-02-19 | | 23 | Lumiere reported here | i2v | 19.0 % | 2025-02-19 | | 24 | Stable Video Diffusion reported here | i2v | 14.8 % | 2025-02-19 | | 25 | Pika reported here | i2v | 13.0 % | 2025-02-19 | | 26 |…

Excerpt shown — open the source for the full document.

Notability

notability 6.0/10

Solid benchmark repo from DeepMind with moderate traction