replicate/cog-owlvit
Python
Captured source
source ↗published Oct 25, 2023seen 5dcaptured 8hhttp 200method plain
replicate/cog-owlvit
Description: Cog wrapper for OWL-ViT
Language: Python
License: Apache-2.0
Stars: 0
Forks: 1
Open issues: 0
Created: 2023-10-25T12:54:07Z
Pushed: 2023-10-25T13:04:10Z
Default branch: main
Fork: no
Archived: no
README:
Cog-OWL-ViT
This is an implementation of Google's [OWL-ViT (v1)](https://github.com/facebookresearch/nougat) as a Cog model. OWL-ViT uses a CLIP backbone to perform text-guided and open-vocabulary object detection. To use the model, simply input the image you'd like to query and enter the objects you would like to query as comma-separated text. For more details, see this Replicate model.
Development
Follow the model pushing guide to push your own fork of OWL-ViT to Replicate.
Basic Usage
To run a prediction:
cog predict -i image=@data/astronaut.png -i query="human face, rocket, star-spangled banner, nasa badge"
To build the cog image and launch the API on your local:
cog run -p 5000 python -m cog.server.http
References
@article{minderer2022simple,
title={Simple Open-Vocabulary Object Detection with Vision Transformers},
author={Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, Neil Houlsby},
journal={ECCV},
year={2022},
}