openai/model_spec_dataset
Captured source
source ↗openai/model_spec_dataset
Description: A public-domain dataset of prompts and scenarios for evaluating compliance with the OpenAI Model Spec.
License: CC0-1.0
Stars: 9
Forks: 1
Open issues: 0
Created: 2026-03-24T19:48:01Z
Pushed: 2026-03-24T19:51:22Z
Default branch: main
Fork: no
Archived: no
README:
OpenAI Model Spec Eval Dataset
A public-domain dataset of prompts and scenarios for evaluating compliance with the OpenAI Model Spec as of 2025-12-18.
This dataset is intended to be run with the Model Spec Eval harness.
The dataset currently contains 596 prompts. However, 9 of them cannot be run through the public OpenAI API, and will be skipped by the above harness, because those examples involve system messages while the API only effectively supports developer messages (in the Inspect chat message format, they are "system" messages but they are sent as "developer" messages to the OpenAI API for newer models like o-series and GPT-5.X).
Each prompt contains some metadata:
targetis the rubric mentioned in Introducing Model Spec Evals which tells the grader the crux of the prompt and what constitutes compliance.focus_idcorresponds to the focus area found in themodel_spec.mdfile in the form[^xxxx]. This is the focus directly tested by the prompt.section_idcorresponds to the id of the immediate section in which thefocus_idis found.sectionscorresponds to the chain of sections containingsection_id.skip(if present) indicates whether the prompt should be skipped for technical reasons, as mentioned above.
Notability
notability 2.0/10Low-stars dataset repo
OpenAI has a repo signal matching data demand, evals and quality, safety and policy.