RepoNVIDIANVIDIApublished May 10, 2023seen 5d

NVIDIA/garak

Python

Open original ↗

Captured source

source ↗
published May 10, 2023seen 5dcaptured 10hhttp 200method plain

NVIDIA/garak

Description: the LLM vulnerability scanner

Language: Python

License: Apache-2.0

Stars: 8076

Forks: 1010

Open issues: 327

Created: 2023-05-10T18:52:16Z

Pushed: 2026-06-10T19:41:45Z

Default branch: main

Fork: no

Archived: no

README:

garak, LLM vulnerability scanner

*Generative AI Red-teaming & Assessment Kit*

garak checks if an LLM can be made to fail in a way we don't want. garak probes for hallucination, data leakage, prompt injection, misinformation, toxicity generation, jailbreaks, and many other weaknesses. If you know nmap or msf / Metasploit Framework, garak does somewhat similar things to them, but for LLMs.

garak focuses on ways of making an LLM or dialog system fail. It combines static, dynamic, and adaptive probes to explore this.

garak's a free tool. We love developing it and are always interested in adding functionality to support applications.

![Tests/Linux](https://github.com/NVIDIA/garak/actions/workflows/test_linux.yml) ![Tests/Windows](https://github.com/NVIDIA/garak/actions/workflows/test_windows.yml) ![Tests/OSX](https://github.com/NVIDIA/garak/actions/workflows/test_macos.yml) ![Documentation Status](http://garak.readthedocs.io/en/latest/?badge=latest) ![PyPI](https://badge.fury.io/py/garak) ![Downloads](https://pepy.tech/project/garak) ![Downloads](https://pepy.tech/project/garak)

Get started

> See our user guide! docs.garak.ai

> Join our Discord!

> Project links & home: garak.ai

> Twitter: @garak_llm

> DEF CON slides!

LLM support

currently supports:

Install:

garak is a command-line tool. It's developed in Linux and OSX.

Standard install with pip

Just grab it from PyPI and you should be good to go:

python -m pip install -U garak

Install development version with pip

The standard pip version of garak is updated periodically. To get a fresher version from GitHub, try:

python -m pip install -U git+https://github.com/NVIDIA/garak.git@main

Clone from source

garak has its own dependencies. You can to install garak in its own Conda environment:

conda create --name garak "python>=3.10,`

`garak` needs to know what model to scan, and by default, it'll try all the probes it knows on that model, using the vulnerability detectors recommended by each probe. You can see a list of probes using:

`garak --list_probes`

To specify a generator, use the `--target_type` and, optionally, the `--target_name` options. Model type specifies a model family/interface; model name specifies the exact model to be used. The "Intro to generators" section below describes some of the generators supported. A straightforward generator family is Hugging Face models; to load one of these, set `--target_type` to `huggingface` and `--target_name` to the model's name on Hub (e.g. `"RWKV/rwkv-4-169m-pile"`). Some generators might need an API key to be set as an environment variable, and they'll let you know if they need that.

`garak` runs all the probes by default, but you can be specific about that too. `--probes promptinject` will use only the [PromptInject](https://github.com/agencyenterprise/promptinject) framework's methods, for example. You can also specify one specific plugin instead of a plugin family by adding the plugin name after a `.`; for example, `--probes lmrc.SlurUsage` will use an implementation of checking for models generating slurs based on the [Language Model Risk Cards](https://arxiv.org/abs/2303.18190) framework.

For help and inspiration, find us on [Twitter](https://twitter.com/garak_llm) or [discord](https://discord.gg/uVch4puUCs)!

## Examples

Probe a commercial model for encoding-based prompt injection (OSX/\*nix) (replace example value with a real OpenAI API key)

export OPENAI_API_KEY="sk-123XXXXXXXXXXXX" python3 -m garak --target_type openai --target_name gpt-5-nano --probes encoding

See if the Hugging Face version of GPT2 is vulnerable to DAN 11.0

python3 -m garak --target_type huggingface --target_name gpt2 --probes dan.Dan_11_0

## Reading the results

For each probe loaded, garak will print a progress bar as it generates. Once generation is complete, a row evaluating that probe's results on each detector is given. If any of the prompt attempts yielded an undesirable behavior, the response will be marked as FAIL, and the failure rate given.

Here are the results with the `encoding` module on a GPT-3 variant:
![alt text](https://i.imgur.com/8Dxf45N.png)

And the same results for ChatGPT:
![alt text](https://i.imgur.com/VKAF5if.png)

We can see that the more recent model is much more susceptible to encoding-based injection attacks, where text-babbage-001 was only found to be vulnerable to quoted-printable and MIME encoding injections. The figures at the end of each row, e.g. 840/840, indicate the number of text generations total and then how many of these seemed to behave OK. The figure can be quite high because more than one generation is made per prompt - by default, 10.

Errors go in `garak.log`; the run is logged in detail in a `.jsonl` file specified at analysis start & end. There's a basic analysis script in `analyse/analyse_log.py` which will output the probes and prompts that led to the most hits.

Send PRs & open issues. Happy hunting!

## Intro to generators

### Hugging Face

Using the Pipeline API:
* `--target_type huggingface` (for transformers models to run locally)
*…

Excerpt shown — open the source for the full document.