RepoReplicateReplicatepublished Oct 25, 2023seen 5d

replicate/cog-owlvit

Python

Open original ↗

Captured source

source ↗
published Oct 25, 2023seen 5dcaptured 8hhttp 200method plain

replicate/cog-owlvit

Description: Cog wrapper for OWL-ViT

Language: Python

License: Apache-2.0

Stars: 0

Forks: 1

Open issues: 0

Created: 2023-10-25T12:54:07Z

Pushed: 2023-10-25T13:04:10Z

Default branch: main

Fork: no

Archived: no

README:

Cog-OWL-ViT

This is an implementation of Google's [OWL-ViT (v1)](https://github.com/facebookresearch/nougat) as a Cog model. OWL-ViT uses a CLIP backbone to perform text-guided and open-vocabulary object detection. To use the model, simply input the image you'd like to query and enter the objects you would like to query as comma-separated text. For more details, see this Replicate model.

Development

Follow the model pushing guide to push your own fork of OWL-ViT to Replicate.

Basic Usage

To run a prediction:

cog predict -i image=@data/astronaut.png -i query="human face, rocket, star-spangled banner, nasa badge"

To build the cog image and launch the API on your local:

cog run -p 5000 python -m cog.server.http

References

@article{minderer2022simple,
title={Simple Open-Vocabulary Object Detection with Vision Transformers},
author={Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, Neil Houlsby},
journal={ECCV},
year={2022},
}