RepoReplicateReplicatepublished Feb 5, 2024seen 5d

replicate/cog-mamba

Python

Open original ↗

Captured source

source ↗
published Feb 5, 2024seen 5dcaptured 8hhttp 200method plain

replicate/cog-mamba

Description: Cog wrapper for Mamba language models

Language: Python

License: Apache-2.0

Stars: 1

Forks: 0

Open issues: 0

Created: 2024-02-05T18:24:05Z

Pushed: 2024-02-05T19:06:40Z

Default branch: main

Fork: no

Archived: no

README:

Cog wrapper for Mamba LLMs

This is a cog wrapper for Mamba LLM models. See the original repo, paper and Replicate demo for details.

Basic Usage

You will need to have Cog and Docker installed to serve your model as an API. Follow the model pushing guide to push your own fork of the model to Replicate with Cog. To run a prediction:

cog predict -i prompt="How are you doing today?"

To start your server and serve the model as an API:

cog run -p 5000 python -m cog.server.http

The API input arguments are as follows:

  • prompt: The text prompt for Mamba.
  • max_length: Maximum number of tokens to generate. A word is generally 2-3 tokens.
  • temperature: Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic, 0.75 is a good starting value.
  • top_p: Samples from the top p percentage of most likely tokens during text decoding, lower to ignore less likely tokens.
  • top_k: Samples from the top k most likely tokens during text decoding, lower to ignore less likely tokens.
  • repetition_penalty: Penalty for repeated words in generated text; 1 is no penalty, values greater than 1 discourage repetition, less than 1 encourage it.
  • seed: The seed parameter for deterministic text generation. A specific seed can be used to reproduce results or left blank for random generation.

References

@article{mamba,
title={Mamba: Linear-Time Sequence Modeling with Selective State Spaces},
author={Gu, Albert and Dao, Tri},
journal={arXiv preprint arXiv:2312.00752},
year={2023}
}