ModelMistral AIMistral AIpublished Nov 30, 2025seen 5d

mistralai/Mistral-Large-3-675B-Base-2512

Open original ↗

Captured source

source ↗
published Nov 30, 2025seen 5dcaptured 9hhttp 200method plainlicense apache-2.0library vllmdownloads 23likes 43

Mistral Large 3 675B Base 2512

From our family of large models, Mistral Large 3 is a state-of-the-art general-purpose Multimodal granular Mixture-of-Experts model with 41B active parameters and 675B total parameters trained from scratch with 3000 H200s.

This model is the base pre-trained version, not fine-tuned for instruction or reasoning tasks, making it ideal for custom post-training processes. Designed for reliability and long-context comprehension - It is engineered for production-grade assistants, retrieval-augmented systems, scientific workloads, and complex enterprise workflows.

Mistral Large 3 Instruct is deployable on-premises in:

  • FP8 on a single node of B200s or H200s.
  • NVFP4 on a single node of H100s or A100s.

Key Features

Mistral Large 3 consists of two main architectural components:

  • A Granular MoE Language Model with 673B params and 39B active
  • A 2.5B Vision Encoder

The Mistral Large 3 Base model offers the following capabilities:

  • Vision: Enables the model to analyze images and provide insights based on visual content, in addition to text.
  • Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
  • Frontier: Delivers best-in-class performance.
  • Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
  • Large Context Window: Supports a 256k context window.

Use Cases

With powerful long-context performance, stable and consistent cross-domain behavior, Mistral Large 3 is perfect for:

  • Long Document Understanding
  • Powerful Daily-Driver AI Assistants
  • State-of-the-Art Agentic and Tool-Use Capabilities
  • Enterprise Knowledge Work
  • General Coding Assistant

And enterprise-grade use cases requiring frontier capabilities.

Recommended Settings

We recommend deploying Large 3 in a client-server configuration with the following best practices:

  • System Prompt: Define a clear environment and use case, including guidance on how to effectively leverage tools in agentic systems.
  • Sampling Parameters: Use a temperature below 0.1 for daily-driver and production environments ; Higher temperatures may be explored for creative use cases - developers are encouraged to experiment with alternative settings.
  • Tools: Keep the set of tools well-defined and limit their number to the minimum required for the use case - Avoiding overloading the model with an excessive number of tools.
  • Vision: When deploying with vision capabilities, we recommend maintaining an aspect ratio close to 1:1 (width-to-height) for images. Avoiding the use of overly thin or wide images - crop them as needed to ensure optimal performance.

Known Issues / Limitations

  • Not a dedicated reasoning model: Dedicated reasoning models can outperform Mistral Large 3 in strict reasoning use cases.
  • Behind vision-first models in multimodal tasks: Mistral Large 3 can lag behind models optimized for vision tasks and use cases.
  • Complex deployment: Due to its large size and architecture, the model can be challenging to deploy efficiently with constrained resources or at scale.

Benchmark Results

We compare Mistral Large 3 to similar sized models.

!image

!image

!image

Instruct Usage

The Instruct model can be used with the following frameworks;

vLLM

We recommend using this model with vLLM.

Installation

Make sure to install vllm >= 1.12.0:

pip install vllm --upgrade

Doing so should automatically install `mistral_common >= 1.8.6`.

To check:

python -c "import mistral_common; print(mistral_common.__version__)"

You can also make use of a ready-to-go docker image or on the docker hub.

Serve

The Mistral Large 3 Instruct FP8 format can be used on one 8xH200 node. We recommend to use this format if you plan to fine-tuning as it can be more precise than NVFP4 in some situations.

A simple launch command is:

vllm serve mistralai/Mistral-Large-3-675B-Instruct-2512 \
--tensor-parallel-size 8 \
--enable-auto-tool-choice --tool-call-parser mistral

Key parameter notes:

  • enable-auto-tool-choice: Required when enabling tool usage.
  • tool-call-parser mistral: Required when enabling tool usage.

Additional flags:

  • You can set --max-model-len to preserve memory. By default it is set to 262144 which is quite large but not necessary for most scenarios.
  • You can set --max-num-batched-tokens to balance throughput and latency, higher means higher throughput but higher latency.

Usage of the model

Here we asumme that the model mistralai/Mistral-Large-3-675B-Instruct-2512 is served and you can ping it to the domain localhost with the port 8000 which is the default for vLLM.

Vision Reasoning

Let's see if Mistral Large 3 knows when to pick a fight !

from datetime import datetime, timedelta

from openai import OpenAI
from huggingface_hub import hf_hub_download

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

TEMP = 0.15
MAX_TOK = 262144

client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id

def load_system_prompt(repo_id: str, filename: str) -> str:
file_path = hf_hub_download(repo_id=repo_id, filename=filename)
with open(file_path, "r") as file:
system_prompt = file.read()
today = datetime.today().strftime("%Y-%m-%d")
yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
model_name = repo_id.split("/")[-1]
return system_prompt.format(name=model_name,…

Excerpt shown — open the source for the full document.

Notability

notability 6.0/10

Major model release but minimal community traction