What does this repo signal mean?

Google (DeepMind / Gemini) published google-deepmind/perception_test (Jupyter Notebook). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo google-deepmind/perception_test · language Jupyter Notebook · Solid new perception repo from DeepMind, moderate stars.. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

Google (DeepMind / Gemini) Repo: google-deepmind/perception_test

Captured source

source ↗

GitHub/github.com/google-deepmind/perception_test

google-deepmind/perception_test repository metadata

Source ↗

published Oct 6, 2022seen 6dcaptured 6dhttp 200method plain

google-deepmind/perception_test

Language: Jupyter Notebook

License: Apache-2.0

Stars: 250

Forks: 14

Open issues: 3

Created: 2022-10-06T15:55:15Z

Pushed: 2026-06-19T13:09:02Z

Default branch: main

Fork: no

Archived: no

README:

Perception Test: A Diagnostic Benchmark for Multimodal Video Models

News

The Fourth Perception Test Challenge will be organised as an ECCV2026 workshop, challenge website here.

Overview

| | | |-----|----| |Quickstart visualisation notebook | ![Open In Colab](https://github.com/deepmind/perception_test/blob/main/data_visualisation.ipynb) | | Dataset Explorer | Dataset Explorer | | Download data | Download section here | | Evaluation scripts (including data loader, dummy baseline, evaluation metrics) | multiple-choice video QA, object tracking, action localisation, point tracking, sound localisation, grounded video QA | | Challenges and evaluation servers | multiple-choice video QA, object tracking, action localisation, point tracking, sound localisation, grounded video QA

Perception Test: A Diagnostic Benchmark for Multimodal Video Models is a multimodal benchmark designed to comprehensively evaluate the perception and reasoning skills of multimodal video models. The Perception Test dataset introduces real-world videos designed to show perceptually interesting situations and defines multiple tasks (object and point tracking, action and sound localisation, multiple-choice and grounded video question-answering) that require understanding of memory, abstract patterns, physics, and semantics, across visual, audio, and text modalities.

In this repository, you will find:

A summary of the Perception Test and the associated challenge
A detailed description of the data and annotations in the Perception Test (interactive demo notebook [here](https://github.com/deepmind/perception_test/blob/main/data_visualisation.ipynb))
Details about how to download the data and annotations in the Perception Test (download section [here](https://github.com/deepmind/perception_test#download-the-data-and-annotations))
Metrics for evaluating the performance on the different tasks (metrics section [here](https://github.com/deepmind/perception_test#metrics))
Dummy baselines showcasing how to evaluate models on each of the tasks (baselines section [here](https://github.com/deepmind/perception_test#baselines))

5-minutes summary of the Perception Test

![Perception Test Overview Presentation](https://youtu.be/8BiajMOBWdk)

*Try the Perception Test for yourself by accessing this quiz.*

For more example videos in the Perception Test, check out this playlist.

Download the data and annotations

The Perception Test dataset can be downloaded as zip files containing:

annotations in JSON format
videos (including audio) as MP4 files
audio-only files in WAV format
pre-computed features for the action localisation and sound localisation tasks.

Full Dataset Splits

| Task | Split | Videos | Audio | Labels | |---------------------------|--------|--------|--------|-------| | Sample | All | sample_videos.zip (214.9MB) | sample_audios.zip (83.9MB) | sample_annotations.zip (3MB) | | All Tasks | Train | train_videos.zip (26.5GB) | train_audios.zip (12.3GB) | train_annotations.zip (30.6MB) | | All Tasks | Valid | valid_videos.zip (70.2GB) | valid_audios.zip (33.1GB) | valid_annotations.zip (81.5MB) | | All Tasks | Test | test_videos.zip (41.8GB) | test_audios.zip (19.3GB) | test_annotations.zip (633.9kB) |

*In test videos where the end of the video gives away the answer to some questions (e.g. in cup-games, where is the hidden object at the end), we cut the end of the video. For the validation and train splits, we provide the frame id where the cut should be made: cut_frame_mapping_train.json, cut_frame_mapping_valid.json.

Challenge Downloads

Video IDs\ Since some of the...

Excerpt shown — open the source for the full document.

Notability

notability 6.0/10

Solid new perception repo from DeepMind, moderate stars.