ModelMistral AIMistral AIpublished Jun 19, 2025seen 5d

mistralai/Mistral-Small-3.2-24B-Instruct-2506

Open original ↗

Captured source

source ↗
published Jun 19, 2025seen 5dcaptured 14hhttp 200method plainlicense apache-2.0library vllmparams 24Bdownloads 533klikes 593

Mistral-Small-3.2-24B-Instruct-2506

Mistral-Small-3.2-24B-Instruct-2506 is a minor update of Mistral-Small-3.1-24B-Instruct-2503.

Small-3.2 improves in the following categories:

  • Instruction following: Small-3.2 is better at following precise instructions
  • Repetition errors: Small-3.2 produces less infinite generations or repetitive answers
  • Function calling: Small-3.2's function calling template is more robust (see here and [examples](#function-calling))

In all other categories Small-3.2 should match or slightly improve compared to Mistral-Small-3.1-24B-Instruct-2503.

Key Features

Benchmark Results

We compare Mistral-Small-3.2-24B to Mistral-Small-3.1-24B-Instruct-2503. For more comparison against other models of similar size, please check Mistral-Small-3.1's Benchmarks'

Text

Instruction Following / Chat / Tone

| Model | Wildbench v2 | Arena Hard v2 | IF (Internal; accuracy) | |-------|---------------|---------------|------------------------| | Small 3.1 24B Instruct | 55.6% | 19.56% | 82.75% | | Small 3.2 24B Instruct | 65.33% | 43.1% | 84.78% |

Infinite Generations

Small 3.2 reduces infinite generations by 2x on challenging, long and repetitive prompts.

| Model | Infinite Generations (Internal; Lower is better) | |-------|-------| | Small 3.1 24B Instruct | 2.11% | | Small 3.2 24B Instruct | 1.29% |

STEM

| Model | MMLU | MMLU Pro (5-shot CoT) | MATH | GPQA Main (5-shot CoT) | GPQA Diamond (5-shot CoT )| MBPP Plus - Pass@5 | HumanEval Plus - Pass@5 | SimpleQA (TotalAcc)| |--------------------------------|-----------|-----------------------|------------------------|------------------------|---------------------------|--------------------|-------------------------|--------------------| | Small 3.1 24B Instruct | 80.62% | 66.76% | 69.30% | 44.42% | 45.96% | 74.63% | 88.99% | 10.43% | | Small 3.2 24B Instruct | 80.50% | 69.06% | 69.42% | 44.22% | 46.13% | 78.33% | 92.90% | 12.10% |

Vision

| Model | MMMU | Mathvista | ChartQA | DocVQA | AI2D | |--------------------------------|------------|-----------|-----------|-----------|-----------| | Small 3.1 24B Instruct | 64.00% | 68.91%| 86.24% | 94.08% | 93.72% | | Small 3.2 24B Instruct | 62.50% | 67.09% | 87.4% | 94.86% | 92.91% |

Usage

The model can be used with the following frameworks;

Note 1: We recommend using a relatively low temperature, such as temperature=0.15.

Note 2: Make sure to add a system prompt to the model to best tailor it to your needs. If you want to use the model as a general assistant, we recommend to use the one provided in the SYSTEM_PROMPT.txt file.

vLLM (recommended)

We recommend using this model with vLLM.

Installation

Make sure to install `vLLM >= 0.9.1`:

pip install vllm --upgrade

Doing so should automatically install `mistral_common >= 1.6.2`.

To check:

python -c "import mistral_common; print(mistral_common.__version__)"

You can also make use of a ready-to-go docker image or on the docker hub.

Serve

We recommend that you use Mistral-Small-3.2-24B-Instruct-2506 in a server/client setting.

1. Spin up a server:

vllm serve mistralai/Mistral-Small-3.2-24B-Instruct-2506 \
--tokenizer_mode mistral --config_format mistral \
--load_format mistral --tool-call-parser mistral \
--enable-auto-tool-choice --limit-mm-per-prompt '{"image":10}' \
--tensor-parallel-size 2

Note: Running Mistral-Small-3.2-24B-Instruct-2506 on GPU requires ~55 GB of GPU RAM in bf16 or fp16.

2. To ping the client you can use a simple Python snippet. See the following examples.

Vision reasoning

Leverage the vision capabilities of Mistral-Small-3.2-24B-Instruct-2506 to make the best choice given a scenario, go catch them all !

Python snippet

from datetime import datetime, timedelta

from openai import OpenAI
from huggingface_hub import hf_hub_download

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

TEMP = 0.15
MAX_TOK = 131072

client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id

def load_system_prompt(repo_id: str, filename: str) -> str:
file_path = hf_hub_download(repo_id=repo_id, filename=filename)
with open(file_path, "r") as file:
system_prompt = file.read()
today = datetime.today().strftime("%Y-%m-%d")
yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
model_name = repo_id.split("/")[-1]
return system_prompt.format(name=model_name, today=today, yesterday=yesterday)

model_id = "mistralai/Mistral-Small-3.2-24B-Instruct-2506"
SYSTEM_PROMPT = load_system_prompt(model_id, "SYSTEM_PROMPT.txt")
image_url = "https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438"

messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{
"role": "user",
"content": [
{
"type": "text",
"text": "What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.",
},
{"type": "image_url", "image_url": {"url": image_url}},
],
},
]

response =…

Excerpt shown — open the source for the full document.

Notability

notability 8.0/10

High downloads, notable model release