What does this repo signal mean?

Meta AI (Llama) published meta-llama/llama (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo meta-llama/llama · language Python · Flagship model release, 59k stars.. onlylabs links this event to 1 captured evidence page and 6 related repo signals. It also maps to Infrastructure in the data-business radar.

Meta AI (Llama) Repo: meta-llama/llama

Captured source

source ↗

GitHub/github.com/meta-llama/llama

meta-llama/llama repository metadata

Source ↗

published Feb 14, 2023seen Jun 5captured Jun 11http 200method plain

meta-llama/llama

Description: Inference code for Llama models

Language: Python

License: NOASSERTION

Stars: 59454

Forks: 9788

Open issues: 520

Created: 2023-02-14T09:29:12Z

Pushed: 2025-01-26T21:42:26Z

Default branch: main

Fork: no

Archived: no

README:

Note of deprecation

Thank you for developing with Llama models. As part of the Llama 3.1 release, we’ve consolidated GitHub repos and added some additional repos as we’ve expanded Llama’s functionality into being an e2e Llama Stack. Please use the following repos going forward:

llama-models - Central repo for the foundation models including basic utilities, model cards, license and use policies
PurpleLlama - Key component of Llama Stack focusing on safety risks and inference time mitigations
llama-toolchain - Model development (inference/fine-tuning/safety shields/synthetic data generation) interfaces and canonical implementations
llama-agentic-system - E2E standalone Llama Stack system, along with opinionated underlying interface, that enables creation of agentic applications
llama-cookbook - Community driven scripts and integrations

If you have any questions, please feel free to file an issue on any of the above repos and we will do our best to respond in a timely manner.

Thank you!

(Deprecated) Llama 2

We are unlocking the power of large language models. Llama 2 is now accessible to individuals, creators, researchers, and businesses of all sizes so that they can experiment, innovate, and scale their ideas responsibly.

This release includes model weights and starting code for pre-trained and fine-tuned Llama language models — ranging from 7B to 70B parameters.

This repository is intended as a minimal example to load Llama 2 models and run inference. For more detailed examples leveraging Hugging Face, see llama-cookbook.

Updates post-launch

See [UPDATES.md](UPDATES.md). Also for a running list of frequently asked questions, see here.

Download

In order to download the model weights and tokenizer, please visit the Meta website and accept our License.

Once your request is approved, you will receive a signed URL over email. Then run the download.sh script, passing the URL provided when prompted to start the download.

Pre-requisites: Make sure you have wget and md5sum installed. Then run the script: ./download.sh.

Keep in mind that the links expire after 24 hours and a certain amount of downloads. If you start seeing errors such as 403: Forbidden, you can always re-request a link.

Access to Hugging Face

We are also providing downloads on Hugging Face. You can request access to the models by acknowledging the license and filling the form in the model card of a repo. After doing so, you should get access to all the Llama models of a version (Code Llama, Llama 2, or Llama Guard) within 1 hour.

Quick Start

You can follow the steps below to quickly get up and running with Llama 2 models. These steps will let you run quick inference locally. For more examples, see the Llama 2 cookbook repository.

1. In a conda env with PyTorch / CUDA available clone and download this repository.

2. In the top-level directory run:

pip install -e .

3. Visit the Meta website and register to download the model/s.

4. Once registered, you will get an email with a URL to download the models. You will need this URL when you run the download.sh script.

5. Once you get the email, navigate to your downloaded llama repository and run the download.sh script.

Make sure to grant execution permissions to the download.sh script
During this process, you will be prompted to enter the URL from the email.
Do not use the “Copy Link” option but rather make sure to manually copy the link from the email.

6. Once the model/s you want have been downloaded, you can run the model locally using the command below:

torchrun --nproc_per_node 1 example_chat_completion.py \
--ckpt_dir llama-2-7b-chat/ \
--tokenizer_path tokenizer.model \
--max_seq_len 512 --max_batch_size 6

Note

Replace llama-2-7b-chat/ with the path to your checkpoint directory and tokenizer.model with the path to your tokenizer model.
The –nproc_per_node should be set to the [MP](#inference) value for the model you are using.
Adjust the max_seq_len and max_batch_size parameters as needed.
This example runs the [example_chat_completion.py](example_chat_completion.py) found in this repository but you can change that to a different .py file.

Inference

Different models require different model-parallel (MP) values:

| Model | MP | |--------|----| | 7B | 1 | | 13B | 2 | | 70B | 8 |

All models support sequence length up to 4096 tokens, but we pre-allocate the cache according to max_seq_len and max_batch_size values. So set those according to your hardware.

Pretrained Models

These models are not finetuned for chat or Q&A. They should be prompted so that the expected answer is the natural continuation of the prompt.

See example_text_completion.py for some examples. To illustrate, see the command below to run it with the llama-2-7b model (nproc_per_node needs to be set to the MP value):

torchrun --nproc_per_node 1 example_text_completion.py \
--ckpt_dir llama-2-7b/ \
--tokenizer_path tokenizer.model \
--max_seq_len 128 --max_batch_size 4

Fine-tuned Chat Models

The fine-tuned models were trained for dialogue applications. To get the expected features and performance for them, a specific formatting defined in `chat_completion` needs to be followed, including the INST and > tags, BOS and EOS tokens, and the whitespaces and breaklines in between (we recommend calling strip() on inputs to avoid double-spaces).

You can also deploy...

Excerpt shown — open the source for the full document.

Notability

notability 10.0/10

Flagship model release, 59k stars.