RepoMistral AIMistral AIpublished Sep 13, 2024seen 6d

mistralai/mistral-evals

Python

Open original ↗

Captured source

source ↗
published Sep 13, 2024seen 6dcaptured 8hhttp 200method plain

mistralai/mistral-evals

Language: Python

Stars: 87

Forks: 14

Open issues: 8

Created: 2024-09-13T18:11:29Z

Pushed: 2025-11-21T10:21:09Z

Default branch: main

Fork: no

Archived: no

README:

Mistral Evals

This repository contains code to run evals released by Mistral AI as well as standardized prompts, parsing and metrics computation for popular academic benchmarks.

Installation

pip install -r requirements.txt

Evals

We support the following evals in this repository:

  • mm_mt_bench: MM-MT-Bench is a multi-turn LLM-as-a-judge evaluation task released by Mistral AI that uses GPT-4o for judging model answers given reference answers.
  • vqav2: VQAv2
  • docvqa: DocVQA
  • mathvista: MathVista
  • mmmu: MMMU
  • chartqa: ChartQA

Example usage:

Step 1: Host a model using vLLM

To install vLLM, follow the directions here.

>> vllm serve mistralai/Pixtral-12B-2409 --config_format mistral --tokenizer_mode "mistral"

Step 2: Evaluate hosted model.

>> python -m eval.run eval_vllm \
--model_name mistralai/Pixtral-12B-2409 \
--url http://0.0.0.0:8000 \
--output_dir ~/tmp \
--eval_name "mm_mt_bench"

NOTE: Evaluating MM-MT-Bench requires calls to GPT-4o as a judge, hence you'll need to set the OPENAI_API_KEY environment variable for the eval to work.

For evaluating the other supported evals, see the Evals section.

Evaluating a non-vLLM model

To evaluate your own model, you can also create a Model class which implements a __call__ method which takes as input a chat completion request and returns a string answer. Requests are provided in vLLM API format.

class CustomModel(Model):

def __call__(self, request: dict[str, Any]):
# Your model code
...
return answer

Usage

*You must not use this library or our models in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights.*

Notability

notability 6.0/10

Notable repo from Mistral, moderate stars