google-deepmind/tips

Jupyter Notebook

Open original ↗

Captured source

source ↗
published Mar 3, 2025seen 6dcaptured 8hhttp 200method plain

google-deepmind/tips

Description: TIPSv2 (CVPR'26) and TIPS (ICLR'25)

Language: Jupyter Notebook

License: Apache-2.0

Stars: 543

Forks: 36

Open issues: 1

Created: 2025-03-03T13:18:42Z

Pushed: 2026-06-01T23:42:16Z

Default branch: main

Fork: no

Archived: no

README: ![Demo-Colab-Pytorch](https://colab.research.google.com/github/google-deepmind/tips/blob/main/pytorch/TIPS_Demo.ipynb)

TIPS / TIPSv2

This repository contains the implementation and models introduced in:

  • TIPSv2: Advancing Vision-Language Pretraining with Enhanced Patch-Text Alignment, CVPR 2026
  • TIPS: Text-Image Pretraining with Spatial Awareness, ICLR 2025

The TIPS series of models (Text-Image Pretraining with Spatial Awareness) are foundational image-text encoders built for general-purpose computer vision and multimodal applications. Our models were validated on a comprehensive suite of 9 tasks and 20 datasets, displaying excellent performance that matches or exceeds other recent vision encoders, with particularly strong spatial awareness.

We recommend using the latest version, TIPSv2, but still provide the earlier TIPSv1 for completeness. For a more detailed overview, please visit the Project Webpage and check out the papers:

See also our [demos and notebooks](#demos-and-notebooks) for a quick start.

Demos and notebooks

![Inference-Colab-Pytorch](https://colab.research.google.com/github/google-deepmind/tips/blob/main/pytorch/TIPS_Demo.ipynb) --> Inference Colab in Pytorch

![Inference-Colab-Jax](https://colab.research.google.com/github/google-deepmind/tips/blob/main/scenic/notebooks/TIPS_Demo.ipynb) --> Inference Colab in Jax

We also provide task-specific notebooks:

![ZS-Pytorch](https://colab.research.google.com/github/google-deepmind/tips/blob/main/pytorch/TIPS_zeroshot_segmentation.ipynb) --> Zero-shot segmentation (Pytorch)

![FG-Seg-Pytorch](https://colab.research.google.com/github/google-deepmind/tips/blob/main/pytorch/TIPS_foreground_segmentation_demo.ipynb) --> Train a linear head for foreground segmentation (Pytorch)

![DPT-Pytorch](https://colab.research.google.com/github/google-deepmind/tips/blob/main/pytorch/TIPS_decoder_inference.ipynb) --> Inference with DPT heads for segmentation, depth and normals (Pytorch)

How to use

We provide both Pytorch and Jax (Scenic) implementations:

  • tips/pytorch/: PyTorch inference for the model.
  • tips/scenic/: Jax-based inference using the

scenic library.

We provide links to all available checkpoints, for both Pytorch and Jax model definitions, together with representative evals.

You can also find TIPSv2 models on HuggingFace here.

TIPSv2 models

| Model size | #Params vision / text | Pytorch ckp. | Jax ckp. | PASCAL seg.↑ | NYU-depth↓ | ImageNet-KNN↑ | Flickr I→T↑ | Flickr T→I↑ | ADE150-ZS↑ | | :--------- | :-------------------- | :----------: | :------: | :---------: | :-------: | :----------: | :------: | :--------: | :--------: | | g/14 | 1.1B / 389.1M | [vision][v2-pth-g14-vision] \| [text][v2-pth-g14-text] | [vision][v2-jax-g14-vision] \| [text][v2-jax-g14-text] | 85.1 | 0.334 | 83.7 | 95.1 | 85.9 | 17.8 | | SO/14 | 412.4M / 448.3M | [vision][v2-pth-so14-vision] \| [text][v2-pth-so14-text]| [vision][v2-jax-so14-vision] \| [text][v2-jax-so14-text]| 85.2 | 0.339 | 82.8 | 94.8 | 84.0 | 23.3 | | L/14 | 303.2M / 183.9M | [vision][v2-pth-l14-vision] \| [text][v2-pth-l14-text] | [vision][v2-jax-l14-vision] \| [text][v2-jax-l14-text] | 85.1 | 0.339 | 82.5 | 95.4 | 83.3 | 24.7 | | B/14 | 85.7M / 109.6M | [vision][v2-pth-b14-vision] \| [text][v2-pth-b14-text] | [vision][v2-jax-b14-vision] \| [text][v2-jax-b14-text] | 84.0 | 0.374 | 79.8 | 92.6 | 80.0 | 17.4 |

TIPSv1 models

| Model size | #Params vision / text | Pytorch ckp. | Jax ckp. | PASCAL seg.↑ | NYU-depth↓ | ImageNet-KNN↑ | UNED-KNN↑ | Flickr I→T↑ | Flickr T→I↑ | | :--------- | :-------------------- | :------------------------------------------------------: | :------------------------------------------------------: | :---------: | :-------: | :----------: | :------: | :--------: | :--------: | | g/14-HR | 1.1B / 389.1M | [vision][v1-pth-g14-hr-vision] \| [text][v1-pth-g14-hr-text] | [vision][v1-jax-g14-hr-vision] \| [text][v1-jax-g14-hr-text] | 83.1 | 0.363 | 83.2 | 68.4 | 93.8 | 83.8 | | g/14-LR | 1.1B / 389.1M | [vision][v1-pth-g14-lr-vision] \| [text][v1-pth-g14-lr-text] | [vision][v1-jax-g14-lr-vision] \| [text][v1-jax-g14-lr-text] | 82.0 | 0.390 | 83.6 | 71.5 | 93.4 | 82.1 | | SO/14-HR | 412.4M / 448.3M | [vision][v1-pth-so14-hr-vision] \| [text][v1-pth-so14-hr-text]| [vision][v1-jax-so14-hr-vision] \| [text][v1-jax-so14-hr-text]| 83.7 | 0.362 | 83.0 | 68.6 | 94.2 | 83.8 | | L/14-HR | 303.2M / 183.9M | [vision][v1-pth-l14-hr-vision] \| [text][v1-pth-l14-hr-text] | [vision][v1-jax-l14-hr-vision] \| [text][v1-jax-l14-hr-text] | 83.9 | 0.372 | 82.5 | 67.8 | 93.6 | 83.5 | | B/14-HR | 85.7M / 109.6M | [vision][v1-pth-b14-hr-vision] \| [text][v1-pth-b14-hr-text] | [vision][v1-jax-b14-hr-vision] \| [text][v1-jax-b14-hr-text] | 82.9 | 0.379 | 80.0 | 62.7 | 91.3 | 79.4 | | S/14-HR | 21.6M / 33.6M | [vision][v1-pth-s14-hr-vision] \| [text][v1-pth-s14-hr-text] | [vision][v1-jax-s14-hr-vision] \| [text][v1-jax-s14-hr-text] | 80.6 | 0.425 | 75.1 | 57.7 | 86.3 | 74.7 |

Local Installation

To install locally instead of using the Colabs/HF, please follow the instructions below.

Installation (Pytorch)

Manage dependencies with a custom environment (eg. Conda)

conda create -n tips python=3.11

# Activate the environment.
conda activate tips

Install Pytorch dependencies.

# Install pytorch (change to GPU version if needed)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

# Install other dependencies.
pip install tensorflow_text mediapy jax jaxlib scikit-learn

# Optionally, install Jupyter to use the notebook.
pip install jupyter

Clone the code from this repo.

git clone https://github.com/google-deepmind/tips.git

# Add the…

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

Decent stars from major lab