replicate/cog-vila
Python
Captured source
source ↗replicate/cog-vila
Description: Cog wrapper for VILA
Language: Python
License: Apache-2.0
Stars: 0
Forks: 0
Open issues: 0
Created: 2024-03-13T13:55:09Z
Pushed: 2024-03-13T14:05:04Z
Default branch: main
Fork: no
Archived: no
README:
VILA
Cog wrapper for VILA, a visual language model (VLM) pretrained with interleaved image-text data. See the paper, official repo and Replicate demos for details.
How to use the API
You need to have Cog and Docker installed to run this model locally. To build the docker container with cog and run a prediction:
cog predict -i image=@sample_images/1.jpg -i prompt="Can you describe this image?"
To start a server and send requests to your locally or remotely deployed API:
cog run -p 5000 python -m cog.server.http
To use VILA, provide an image and a text prompt. The response is generated by decoding the model's output using beam search with the specified parameters. The input arguments to the API are as follows:
- image: The image to discuss.
- prompt: The query to generate a response for.
- top_p: When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens.
- temperature: When decoding text, higher values make the model more creative.
- num_beams: Number of beams to use when decoding text; higher values are slower but more accurate.
- max_tokens: Maximum number of tokens to generate.
References
@misc{lin2023vila,
title={VILA: On Pre-training for Visual Language Models},
author={Ji Lin and Hongxu Yin and Wei Ping and Yao Lu and Pavlo Molchanov and Andrew Tao and Huizi Mao and Jan Kautz and Mohammad Shoeybi and Song Han},
year={2023},
eprint={2312.07533},
archivePrefix={arXiv},
primaryClass={cs.CV}
}