What does this repo signal mean?

Mistral AI published mistralai/mistral-evals (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo mistralai/mistral-evals · language Python · Notable repo from Mistral, moderate stars. onlylabs links this event to 1 captured evidence page and 6 related repo signals. It also maps to Evals and quality in the data-business radar.

Mistral AI Repo: mistralai/mistral-evals

Captured source

source ↗

GitHub/github.com/mistralai/mistral-evals

mistralai/mistral-evals repository metadata

Source ↗

published Sep 13, 2024seen Jun 5captured Jun 11http 200method plain

mistralai/mistral-evals

Language: Python

Stars: 87

Forks: 14

Open issues: 8

Created: 2024-09-13T18:11:29Z

Pushed: 2025-11-21T10:21:09Z

Default branch: main

Fork: no

Archived: no

README:

Mistral Evals

This repository contains code to run evals released by Mistral AI as well as standardized prompts, parsing and metrics computation for popular academic benchmarks.

Installation

pip install -r requirements.txt

Evals

We support the following evals in this repository:

mm_mt_bench: MM-MT-Bench is a multi-turn LLM-as-a-judge evaluation task released by Mistral AI that uses GPT-4o for judging model answers given reference answers.
vqav2: VQAv2
docvqa: DocVQA
mathvista: MathVista
mmmu: MMMU
chartqa: ChartQA

Example usage:

Step 1: Host a model using vLLM

To install vLLM, follow the directions here.

>> vllm serve mistralai/Pixtral-12B-2409 --config_format mistral --tokenizer_mode "mistral"

Step 2: Evaluate hosted model.

>> python -m eval.run eval_vllm \
--model_name mistralai/Pixtral-12B-2409 \
--url http://0.0.0.0:8000 \
--output_dir ~/tmp \
--eval_name "mm_mt_bench"

NOTE: Evaluating MM-MT-Bench requires calls to GPT-4o as a judge, hence you'll need to set the OPENAI_API_KEY environment variable for the eval to work.

For evaluating the other supported evals, see the Evals section.

Evaluating a non-vLLM model

To evaluate your own model, you can also create a Model class which implements a __call__ method which takes as input a chat completion request and returns a string answer. Requests are provided in vLLM API format.

class CustomModel(Model):

def __call__(self, request: dict[str, Any]):
# Your model code
...
return answer

Usage

*You must not use this library or our models in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights.*

Notability

notability 6.0/10

Notable repo from Mistral, moderate stars