RepoOpenAIOpenAIpublished Feb 9, 2026seen 6d

openai/parameter-golf

Python

Open original ↗

Captured source

source ↗
published Feb 9, 2026seen 6dcaptured 9hhttp 200method plain

openai/parameter-golf

Description: Train the smallest LM you can that fits in 16MB. Best model wins!

Language: Python

License: MIT

Stars: 5110

Forks: 3346

Open issues: 1472

Created: 2026-02-09T06:22:45Z

Pushed: 2026-05-04T20:07:19Z

Default branch: main

Fork: no

Archived: no

README:

OpenAI Model Craft Challenge: Parameter Golf is a challenge to train the best language model that fits in a 16MB artifact and trains in under 10 minutes on 8xH100s, evaluated by compression on the FineWeb validation set (tokenizer-agnostic, bits per byte).

This challenge is heavily inspired by the NanoGPT Speedrunning challenge, where participants compete to train a model that reaches 3.28 FineWeb validation loss as quickly as possible. We're excited to see how optimizing for a parameter-constrained setting pushes people toward unique architectures (test-time compute, aggressive parameter tying, depth recurrence, low-rank training, ...), compression schemes (low precision, QAT, bitnets, novel tokenizers, ...), and other creative submissions (test-time training, long context, megakernels ...).

If you're familiar with neural scaling laws, you can consider this challenge a form of L(N) optimization, where the objective is to optimize the lowest loss given a fixed number of parameters (N) unconstrained by data, compute, steps, or architecture. Challenges like the NanoGPT Speedrun, which optimizes for a form of L(T) (~lowest time given constrained loss) or the NanoGPT Slowrun, which optimizes for L(D) (lowest loss given constrained dataset size), can be thought of as equivalent challenges in this family.

Ideally, we'd allow for submissions to use arbitrary computational resources. But in order to make the challenge not inaccessibly expensive, we're limiting *leaderboard submissions* to 10 minutes on 8xH100s. However, we'd still love to see submissions that don't meet the compute limitation requirements in our 'Non-record Submissions' section: We're excited to see people push the infinite frontier of parameter limited performance as well.

We also know compute is expensive, so OpenAI is sponsoring $1,000,000 in compute credits to help people get started training their models. To request a compute grant, use this form: Request a Compute Grant. When requesting compute, please make sure you choose the appropriate level, write sufficient justification, and submit with an email tied to a OpenAI / ChatGPT account.

Participant Form

If you enjoy solving very difficult technical problems, please introduce yourself via the Challenge Participant Form. It helps us attribute challenge submissions and reach out about opportunities with OpenAI. _Completing the form is not required to participate._

Many researchers at OpenAI first distinguished themselves through elite mathematics and programming competitions. The Model Craft Challenge is designed in that spirit: testing the ability to tackle unfamiliar problems with creativity and rigor, qualities we believe are essential for frontier AI research.

In June, we plan to hire a small cohort of early-career researchers, targeting current undergraduate students and recent graduates, including Olympiad medalists and elite competitors. For exceptional participants, the challenge may also serve as a way to stand out to OpenAI researchers and recruiters.

The challenge runs from March 18th to April 30th.

Happy training!

Leaderboard

| Run | Score | Author | Summary | Date | Info | |-----|------:|--------|---------|------|------| | Calib32 Token-Only N-gram + AsymLogit Stack | 1.0565 | codemath3000 | On PR #2135: pre-cutoff PR #2130 architecture rerun on clean canonical CaseOps data with GPTQ_CALIBRATION_BATCHES=32; 3-seed mean 1.05651 under grace policy (p=0.014 vs PR #2014) | 2026-05-01 | info | | Progressive Context Growth + Short-Doc Score-First TTT | 1.0576 | simonbissonnette | On PR #2014: PR #1855/#1953 CaseOps stack with progressive context growth to 3k plus short-doc score-first TTT on the AWQ-lite/AsymLogit lineage; 3-seed mean 1.05759 (p=0.011 vs PR #1953) | 2026-04-30 | info | | Long-Context No-Q/V TTT + QK-Gain 5.25 | 1.0586 | andrewbaggio1 | On PR #1953: PR #1945 V21 base with 2560 eval/TTT context, no-Q/V TTT mask, TTT LR 0.75, and QK_GAIN_INIT=5.25; 3-seed mean 1.05855 (p=0.063 vs PR #1945 V21 v2) | 2026-04-30 | info | | AWQ-Lite GPTQ + AsymLogit on PR1855 Stack | 1.0594 | alertcat | On PR #1945 commit 70067534: PR #1855 stack plus PR #1908 AWQ-lite mixed GPTQ and PR #1923 AsymLogit; V21 v2 3-seed mean 1.05943 after strict seed-42 rerun (p=0.034 vs PR #1855) | 2026-04-29 | info, commit | | BOS-Fixed SmearGate + LQER + SparseAttnGate + 9-Hparam Stack | 1.0611 | codemath3000 | On PR #1855: BOS-fixed #1797-derived stack with LQER, PR #1787 SparseAttnGate/PolarNS/FusedCE base, per-group lrzip compression, and 9 greedy hyperparameter overrides; submitted 3-seed mean 1.06108 with broader reproduction support (p=0.188 vs PR #1868 latest rerun) | 2026-04-27 | info, repro | | BOS-Fixed SmearGate + LQER Asymmetric + PR1787 SparseAttn + Phased TTT | 1.0614 | aquariouseworkman | On PR #1851 with 3-seed compliance-rerun support from PR #1868: BOS-boundary fix from PR #1851 applied to dexhunter's PR #1797 SmearGate + LQER stack, using the PR #1787 SparseAttnGate/PolarNS/FusedCE base plus CaseOps and phased score-first TTT | 2026-04-27 | info, 3-seed | | PR1736 + PolarNS + MIN_LR + SparseAttnGate + FusedCE + Warm-A TTT | 1.0634 | nprime06 | On PR #1787: PR #1736 CaseOps stack plus Polar Express Newton-Schulz coefficients, MIN_LR=0.1, SparseAttnGate, fused softcapped CE, and PR #1767-style…

Excerpt shown — open the source for the full document.

Notability

notability 6.0/10

OpenAI repo with notable traction, not a major launch