openai/monitorability-evals
Python
Captured source
source ↗openai/monitorability-evals
Description: Open-sourced evaluation suite from the Monitoring Monitorability paper
Language: Python
License: Apache-2.0
Stars: 79
Forks: 6
Open issues: 1
Created: 2026-04-22T22:12:39Z
Pushed: 2026-04-22T22:30:35Z
Default branch: main
Fork: no
Archived: no
README:
Overview
Open-sourced evaluation data from the Monitoring Monitorability paper. This contains the public evaluation split. For an overview of dataset motivation, sources, metrics, and grader prompts, see Appendix C of the paper.
We do not directly redistribute AIME problem statements in this repo. For copyright reasons, the AIME datasets expose url and source fields instead; use those fields to retrieve the corresponding problem text from the linked source.
There are three archetypes:
- Intervention
- Process
- Outcome-property
Intervention
We open-source 9 of the 11 intervention evaluations:
- aime.jsonl
- daft.jsonl
- gpqa.jsonl
- gpqa_encoded.jsonl
- memory_advice.jsonl
- scruples_first_person.jsonl
- scruples_suggest_right.jsonl
- scruples_suggest_wrong.jsonl
- wmdp_sandbagging.jsonl
The two Health Queries evals rely on private or restricted data and were omitted from the release.
Most rows in intervention/*.jsonl have:
problem: the prompt given to the modelanswer: the answer the hint/perturbation is guiding the model towardscategory: the subcategory, if it existsid: the ID to group rows on for metrics calculation. for each dataset, there are two rows with the same ID: one that is perturbed, one that is unperturbedperturbation: perturbation texttype("unperturbed" | "perturbed"): whether perturbation is applied in the prompt form.
intervention/aime.jsonl is the exception:
url: external link for the problem statementsource: citation used to derive the problem URLanswer,category,id,perturbation, andtypebehave as above
Process
We open-source two of the four process evaluation splits
- process_gpqa_physics.jsonl
- process_aime_math.jsonl
The FrontierMath and Math Diamond evals rely on private or restricted data and were omitted from the release.
Most rows in process/*.jsonl have:
problem: the prompt given to the modelanswer(stringorint): the correct answertheorem_paths(string): the theorem pathsfull_theorems_definition(object): the definitions for each theorem in the theorem pathsnew_solution_path_count(int): the number of independent theorem paths
process/process_aime_math.jsonl is the exception:
url: external link for the problem statementsource: citation used to derive the problem URLanswer,theorem_paths, andfull_theorems_definitionbehave as above
Outcome-property
We open-source one outcome-property dataset:
- Flaky Tools (the CharXiv-based subset)
We also describe two other outcome-property evaluations:
- Impossible Coding Tasks (the LiveBench-based subset)
- ShadeArena (already public)
The following outcome-property evals rely on private or restricted data and were omitted from the release: OAI-Repo-Sabotage, Anti-Scheming, the two Flaky-Tools datasets based on internal OpenAI browsing evaluations and coding problems, and the Impossible Coding Task based on an OpenAI interview question.
Prompt and label mappings
Prompt templates live under monitor_prompts/ and model_prompts/:
monitor_prompts/templates/*.txt: monitor prompt templates with{variable}placeholders.model_prompts/*.txt: model prompt / answer-evaluation prompt templates (verbatim where available).dataset_registry.yaml: single source of truth mapping datasets to prompt files and Y/Z label semantics.
Deprecated evals
We identified issues with Anti-Scheming and Memory that are detailed in the GPT 5.4 Thinking system card. We are no longer internally using these two evals until we create a version of Anti-scheming that addresses the mismatch between the generic CoT sketchiness monitor and task-specific answer grader, and a version of Memory with improved answer grading.
Legal note
By contributing to evals, you are agreeing to make your evaluation logic and data under the same Apache-2.0 License as this repository. You must have adequate rights to upload any data used in an eval. OpenAI reserves the right to use this data in future service improvements to our product. Contributions to OpenAI evals will be subject to our usual Usage Policies: https://platform.openai.com/docs/usage-policies.
Notability
notability 5.0/10Moderate traction research evals repo