google-deepmind/tapnet
Jupyter Notebook
Captured source
source ↗google-deepmind/tapnet
Description: Tracking Any Point (TAP)
Language: Jupyter Notebook
License: Apache-2.0
Stars: 1926
Forks: 184
Open issues: 29
Created: 2022-11-03T14:59:43Z
Pushed: 2026-06-24T21:36:19Z
Default branch: main
Fork: no
Archived: no
README:
Tracking Any Point (TAP)
[`TAP-Vid`] [`TAPIR`] [`RoboTAP`] [`Blog Post`] [`BootsTAP`] [`TAPVid-3D`] [`TAPNext`] [`TRAJAN`] [`TAPNext++`]
https://github.com/google-deepmind/tapnet/assets/4534987/9f66b81a-7efb-48e7-a59c-f5781c35bebc
Welcome to the official Google Deepmind repository for Tracking Any Point (TAP), home of the TAP-Vid and TAPVid-3D Datasets, our top-performing TAPIR model, and our RoboTAP extension.
- TAP-Vid is a benchmark for models that perform this task, with a collection of ground-truth points for both real and synthetic videos.
- TAPIR is a two-stage algorithm which employs two stages: 1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations. The resulting model is fast and surpasses all prior methods by a significant margin on the TAP-Vid benchmark.
- RoboTAP is a system which utilizes TAPIR point tracks to execute robotics manipulation tasks through efficient imitation in the real world. It also includes a dataset with ground-truth points annotated on real robotics manipulation videos.
- BootsTAP (or Bootstrapped Training for TAP) uses a large dataset of unlabeled, real-world video to improve tracking accuracy. Specifically, the model is trained to give consistent predictions across different spatial transformations and corruptions of the video, as well as different choices of the query points. We apply it to TAPIR to create BootsTAPIR, which is architecturally similar to TAPIR but substantially outperforms it on TAP-Vid.
- TAPVid-3D is a benchmark and set of metrics for models that perform the 3D point tracking task. The benchmark contains 1M+ computed ground-truth trajectories on 4,000+ real-world videos.
- TAPNext is our latest, most capable, fastest, yet simplest tracker. It formulates the TAP problem as next token prediction and tracks points simply by propagating information through a network. Note that our best TAPNext checkpoint was fine-tuned using the BootsTAP procedure.
- TRAJAN is our first point TRAJectory AutoeNcoder (TRAJAN). TRAJAN conditions on a set of support point trajectories and reconstructs the trajectories of a held out set of query points. The embedding space learned by TRAJAN can be used to compare distributions of videos, to compare motion trajectories in different videos independent of object appearances, and to evaluate the realism and consistency of videos output by generative video models.
- TAPNext++ is an improved TAPNext checkpoint that has a 40x longer stable tracking performance, allows tracking through occlusions and shows strong re-detection capabilities. It has been fine-tuned on 1024-frame synthetic sequences with a variety of novel training strategies.
This repository contains the following:
- [TAPNext / TAPNext++ / TAPIR / BootsTAPIR Demos](#demos) for both online colab demo and offline real-time demo by cloning this repo
- [TRAJAN Demo](#demos) for online colab demo
- [TAP-Vid Benchmark](#tap-vid) for both evaluation dataset and evaluation metrics
- [RoboTAP Benchmark](#roboTAP) for both evaluation dataset and point track based clustering code
- [TAPVid-3D Benchmark](#tapvid-3d) for the evaluation metrics and sample evaluation code for the TAPVid-3D benchmark.
- [Checkpoints](#checkpoints) for TAP-Net (the baseline presented in the TAP-Vid paper), TAPIR and BootsTAPIR pre-trained model weights in both Jax and PyTorch
- [Instructions](#training) for training TAP-Net (the baseline presented in the TAP-Vid paper) and TAPIR on Kubric
Demos
The simplest way to run TAPNext / TAPNext++ / TAPIR / BootsTAPIR is to use our colab demos online. You can also clone this repo and run on your own hardware, including a real-time demo.
Colab Demo
You can run colab demos to see how TAPIR works. You can also upload your own video and try point tracking with TAPIR. We provide a few colab demos:
1. TAPNext++: This is a fine-tuned BootsTAPNext checkpoint capable of long-term tracking, occlusion tracking and re-detection. It has been fine-tuned on PointOdyssey and Kubric-1024.
2. BootsTAPNext: This is the most powerful TAPNext model that runs online (per-frame). This is the BootsTAPNext model reported in the paper.
3. BootsTAPNext PyTorch: This is the most powerful TAPNext model re-implemented in PyTorch, which contains the exact architecture & weights as the Jax model.
4. Standard TAPIR: This is the most powerful TAPIR / BootsTAPIR model that runs on a whole video at once. We mainly report the results of this model in the paper.
5. Online TAPIR: This is the sequential causal TAPIR / BootsTAPIR model that allows for online tracking on points, which can be run in real-time on a GPU platform.
6. Rainbow Visualization: This visualization is used in many of our teaser videos: it does automatic foreground/background segmentation and corrects the tracks for the camera motion, so you can visualize the paths objects take through real space.
7. Standard PyTorch TAPIR: This is the TAPIR / BootsTAPIR model re-implemented in PyTorch, which contains the exact architecture & weights as the Jax model.
8. Online PyTorch TAPIR: This is the sequential causal BootsTAPIR model re-implemented in PyTorch, which contains the exact architecture & weights as the Jax model.
9. TRAJAN: This is the point trajectory autoencoder for reconstructing the motion of held out point trajectories conditioned on a set of input point trajectories. 10....
Excerpt shown — open the source for the full document.
Notability
notability 8.0/10New DeepMind research repo with high traction