google-deepmind/perception_test
Jupyter Notebook
Captured source
source ↗google-deepmind/perception_test
Language: Jupyter Notebook
License: Apache-2.0
Stars: 250
Forks: 14
Open issues: 3
Created: 2022-10-06T15:55:15Z
Pushed: 2026-06-19T13:09:02Z
Default branch: main
Fork: no
Archived: no
README:
Perception Test: A Diagnostic Benchmark for Multimodal Video Models
News
The Fourth Perception Test Challenge will be organised as an ECCV2026 workshop, challenge website here.
Overview
| | | |-----|----| |Quickstart visualisation notebook |  | | Dataset Explorer | Dataset Explorer | | Download data | Download section here | | Evaluation scripts (including data loader, dummy baseline, evaluation metrics) | multiple-choice video QA, object tracking, action localisation, point tracking, sound localisation, grounded video QA | | Challenges and evaluation servers | multiple-choice video QA, object tracking, action localisation, point tracking, sound localisation, grounded video QA
Perception Test: A Diagnostic Benchmark for Multimodal Video Models is a multimodal benchmark designed to comprehensively evaluate the perception and reasoning skills of multimodal video models. The Perception Test dataset introduces real-world videos designed to show perceptually interesting situations and defines multiple tasks (object and point tracking, action and sound localisation, multiple-choice and grounded video question-answering) that require understanding of memory, abstract patterns, physics, and semantics, across visual, audio, and text modalities.
In this repository, you will find:
- A summary of the Perception Test and the associated challenge
- A detailed description of the data and annotations in the Perception Test (interactive demo notebook [here](https://github.com/deepmind/perception_test/blob/main/data_visualisation.ipynb))
- Details about how to download the data and annotations in the Perception Test (download section [here](https://github.com/deepmind/perception_test#download-the-data-and-annotations))
- Metrics for evaluating the performance on the different tasks (metrics section [here](https://github.com/deepmind/perception_test#metrics))
- Dummy baselines showcasing how to evaluate models on each of the tasks (baselines section [here](https://github.com/deepmind/perception_test#baselines))
5-minutes summary of the Perception Test

*Try the Perception Test for yourself by accessing this quiz.*
For more example videos in the Perception Test, check out this playlist.
Download the data and annotations
The Perception Test dataset can be downloaded as zip files containing:
- annotations in JSON format
- videos (including audio) as MP4 files
- audio-only files in WAV format
- pre-computed features for the action localisation and sound localisation tasks.
Full Dataset Splits
| Task | Split | Videos | Audio | Labels | |---------------------------|--------|--------|--------|-------| | Sample | All | sample_videos.zip (214.9MB) | sample_audios.zip (83.9MB) | sample_annotations.zip (3MB) | | All Tasks | Train | train_videos.zip (26.5GB) | train_audios.zip (12.3GB) | train_annotations.zip (30.6MB) | | All Tasks | Valid | valid_videos.zip (70.2GB) | valid_audios.zip (33.1GB) | valid_annotations.zip (81.5MB) | | All Tasks | Test | test_videos.zip (41.8GB) | test_audios.zip (19.3GB) | test_annotations.zip (633.9kB) |
*In test videos where the end of the video gives away the answer to some questions (e.g. in cup-games, where is the hidden object at the end), we cut the end of the video. For the validation and train splits, we provide the frame id where the cut should be made: cut_frame_mapping_train.json, cut_frame_mapping_valid.json.
Challenge Downloads
Video IDs\ Since some of the...
Excerpt shown — open the source for the full document.
Notability
notability 6.0/10Solid new perception repo from DeepMind, moderate stars.