mistralai/Mistral-Large-3-675B-Instruct-2512
Captured source
source ↗Mistral Large 3 675B Instruct 2512
From our family of large models, Mistral Large 3 is a state-of-the-art general-purpose Multimodal granular Mixture-of-Experts model with 41B active parameters and 675B total parameters trained from the ground up with 3000 H200s.
This model is the instruct post-trained version in FP8, fine-tuned for instruction tasks, making it ideal for chat, agentic and instruction based use cases. Designed for reliability and long-context comprehension - It is engineered for production-grade assistants, retrieval-augmented systems, scientific workloads, and complex enterprise workflows.
Learn more in our blog post here.
Mistral Large 3 is deployable on-premises in:
- FP8 on a single node of B200s or H200s.
- NVFP4 on a single node of H100s or A100s.
We provide a BF16 version if needed.
Key Features
Mistral Large 3 consists of two main architectural components:
- A Granular MoE Language Model with 673B params and 39B active
- A 2.5B Vision Encoder
The Mistral Large 3 Instruct model offers the following capabilities:
- Vision: Enables the model to analyze images and provide insights based on visual content, in addition to text.
- Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
- System Prompt: Maintains strong adherence and support for system prompts.
- Agentic: Offers best-in-class agentic capabilities with native function calling and JSON outputting.
- Frontier: Delivers best-in-class performance.
- Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
- Large Context Window: Supports a 256k context window.
Use Cases
With powerful long-context performance, stable and consistent cross-domain behavior, Mistral Large 3 is perfect for:
- Long Document Understanding
- Powerful Daily-Driver AI Assistants
- State-of-the-Art Agentic and Tool-Use Capabilities
- Enterprise Knowledge Work
- General Coding Assistant
And enterprise-grade use cases requiring frontier capabilities.
Recommended Settings
We recommend deploying Large 3 in a client-server configuration with the following best practices:
- System Prompt: Define a clear environment and use case, including guidance on how to effectively leverage tools in agentic systems.
- Sampling Parameters: Use a temperature below 0.1 for daily-driver and production environments ; Higher temperatures may be explored for creative use cases - developers are encouraged to experiment with alternative settings.
- Tools: Keep the set of tools well-defined and limit their number to the minimum required for the use case - Avoiding overloading the model with an excessive number of tools.
- Vision: When deploying with vision capabilities, we recommend maintaining an aspect ratio close to 1:1 (width-to-height) for images. Avoiding the use of overly thin or wide images - crop them as needed to ensure optimal performance.
Known Issues / Limitations
- Not a dedicated reasoning model: Dedicated reasoning models can outperform Mistral Large 3 in strict reasoning use cases.
- Behind vision-first models in multimodal tasks: Mistral Large 3 can lag behind models optimized for vision tasks and use cases.
- Complex deployment: Due to its large size and architecture, the model can be challenging to deploy efficiently with constrained resources or at scale.
Benchmark Results
We compare Mistral Large 3 to similar sized models.
Usage
The model can be used with the following frameworks;
- `vllm`: See [here](#vllm)
> [!Note] > We sadly didn't have enough time to add Mistral Large 3 to transformers, but we would be very happy for a community contribution by opening a PR to huggingface/transformers.
vLLM
We recommend using this model with vLLM.
Installation
Make sure to install vllm >= 1.12.0:
pip install vllm --upgrade
Doing so should automatically install `mistral_common >= 1.8.6`.
To check:
python -c "import mistral_common; print(mistral_common.__version__)"
You can also make use of a ready-to-go docker image or on the docker hub.
Serve
The Mistral Large 3 Instruct FP8 format can be used on one 8xH200 node. We recommend to use this format if you plan to fine-tuning as it can be more precise than NVFP4 in some situations.
Simple
A simple launch command is:
vllm serve mistralai/Mistral-Large-3-675B-Instruct-2512 \ --max-model-len 262144 --tensor-parallel-size 8 \ --tokenizer_mode mistral --config_format mistral --load_format mistral \ --enable-auto-tool-choice --tool-call-parser mistral
Key parameter notes:
- enable-auto-tool-choice: Required when enabling tool usage.
- tool-call-parser mistral: Required when enabling tool usage.
Additional flags:
- You can set
--max-model-lento preserve memory. By default it is set to262144which is quite large but not necessary for most scenarios. - You can set
--max-num-batched-tokensto balance throughput and latency, higher means higher throughput but higher latency.
Accelerated with speculative decoding
For maximum performance we recommend serving the checkpoint with its customized draft model Mistral-Large-3-675B-Instruct-2512-Eagle:
vllm serve mistralai/Mistral-Large-3-675B-Instruct-2512 \ --tensor-parallel-size 8 \ --load-format mistral \ --tokenizer-mode mistral \ --config-format mistral \ --enable-auto-tool-choice…
Excerpt shown — open the source for the full document.
Notability
notability 8.0/10Major model release, modest traction.