What does this repo signal mean?

OpenAI published openai/frontier-evals (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo openai/frontier-evals · language Python · New eval toolkit from OpenAI, moderate stars.. onlylabs links this event to 1 captured evidence page and 6 related repo signals. It also maps to Evals and quality in the data-business radar.

OpenAI Repo: openai/frontier-evals

Captured source

source ↗

GitHub/github.com/openai/frontier-evals

openai/frontier-evals repository metadata

Source ↗

published Mar 28, 2025seen 6dcaptured 8hhttp 200method plain

openai/frontier-evals

Description: OpenAI Frontier Evals

Language: Python

License: MIT

Stars: 1219

Forks: 162

Open issues: 12

Created: 2025-03-28T15:45:01Z

Pushed: 2026-04-21T20:53:31Z

Default branch: main

Fork: no

Archived: no

README:

Frontier Evals

Code for evals measuring frontier model capabilities.

[PaperBench](./project/paperbench): End-to-end replication of state-of-the-art AI papers. Paper | Blog
[SWE-Lancer](./project/swelancer): Real freelance software engineering tasks with end-to-end tests. Paper | Blog
[EVMbench](./project/evmbench): Smart contract security tasks. Paper | Blog

Usage

Requirements

We manage environments with `uv`. Install uv once, then run uv sync (or uv pip install -r ...) inside the project of interest to create its virtual environment from the checked-in uv.lock.

Running Evals

Each eval directory documents how to reproduce runs, configure models, and interpret results. Start with the suite README.md, then consult any scripts under scripts/ or runtime_*/ directories for orchestration details. When in doubt:

1. cd into the eval directory. 2. uv sync to install dependencies. 3. Follow the local instructions in the README.md.

Contributing

Layout

.
├── pyproject.toml # Shared tooling configuration (Ruff, Black, etc.)
└── project/
├── common/ # Shared libraries
├── evmbench/ # EVMBench eval
├── paperbench/ # PaperBench eval
└── swelancer/ # SWE-Lancer eval

Each eval directory is its own isolated project with a README.md, pyproject.toml and uv.lock.

Development Workflow

Create or activate the environment for the project you are working on with uv. Example for PaperBench:
cd project/paperbench
uv sync
uv run pytest
Code style and linting use Ruff (with autofix profiles in pyproject.toml and project/common/tooling/ruff_autofix_minimal.toml) and Black. Run uv run ruff check --fix or use the provided Poe/make tasks where available.
Shared utilities live under project/common; changes there may affect multiple evals. Bump the relevant editable dependencies if you create new shared subpackages.

Notability

notability 6.0/10

New eval toolkit from OpenAI, moderate stars.