WritingOpenAIOpenAIpublished Aug 7, 2025seen 6d

Introducing GPT-5 for developers

Open original ↗

Captured source

source ↗
published Aug 7, 2025seen 6dcaptured 3dhttp 200method exa

Introducing GPT‑5 for developers | OpenAI

August 7, 2025

Introducing GPT‑5 for developers

The best model for coding and agentic tasks.

Loading…

Share

Introduction

Today, we’re releasing GPT‑5 in our API platform—our best model yet for coding and agentic tasks.

GPT‑5 is state-of-the-art (SOTA) across key coding benchmarks, scoring 74.9% on SWE-bench Verified and 88% on Aider polyglot. We trained GPT‑5 to be a true coding collaborator. It excels at producing high-quality code and handling tasks such as fixing bugs, editing code, and answering questions about complex codebases. The model is steerable and collaborative—it can follow very detailed instructions with high accuracy and can provide upfront explanations of its actions before and between tool calls. The model also excels at front-end coding, beating OpenAI o3 at frontend web development 70% of the time in internal testing.

We trained GPT‑5 on real-world coding tasks in collaboration with early testers across startups and enterprises. Cursor says GPT‑5 is “the smartest model [they’ve] used” and “remarkably intelligent, easy to steer, and even has a personality [they] haven’t seen in other models.” Windsurf shared GPT‑5 is SOTA on their evals and “has half the tool calling error rate over other frontier models.” Vercel says “it’s the best frontend AI model, hitting top performance across both the aesthetic sense and the code quality, putting it in a category of its own.”

GPT‑5 also excels at long-running agentic tasks—achieving SOTA results on τ2-bench telecom (96.7%), a tool-calling benchmark released just 2 months ago. GPT‑5’s improved tool intelligence lets it reliably chain together dozens of tool calls—both in sequence and in parallel—without losing its way, making it far better at executing complex, real-world tasks end to end. It also follows tool instructions more precisely, is better at handling tool errors, and excels at long-context content retrieval. Manus says GPT‑5 “achieved the best performance [they’ve] ever seen from a single model on [their] internal benchmarks.” Notion says “[the model’s] rapid responses, especially in low reasoning mode, make GPT‑5 an ideal model when you need complex tasks solved in one shot.” Inditex shared “what truly sets [GPT‑5] apart is the depth of its reasoning: nuanced, multi-layered answers that reflect real subject-matter understanding.”

We’re introducing new features in our API to give developers more control over model responses. GPT‑5 supports a newverbosity parameter (values:low,medium,high) to help control whether answers are short and to the point or long and comprehensive. GPT‑5’sreasoning_effort parameter can now take a minimal value to get answers back faster, without extensive reasoning first. We’ve also added a new tool type—custom tools—to let GPT‑5 call tools with plaintext instead of JSON. Custom tools support constraining by developer-supplied context-free grammars.

We’re releasing GPT‑5 in three sizes in the API—gpt-5,gpt-5-mini, andgpt-5-nano—to give developers more flexibility to trade off performance, cost, and latency. While GPT‑5 in ChatGPT is a system of reasoning, non-reasoning, and router models, GPT‑5 in the API platform is the reasoning model that powers maximum performance in ChatGPT. Notably, GPT‑5 with minimal reasoning is a different model than the non-reasoning model in ChatGPT, and is better tuned for developers. The non-reasoning model used in ChatGPT is available asgpt-5-chat-latest.

To read about GPT‑5 in ChatGPT, and learn more about other ChatGPT improvements, see our research blog. For more on how enterprises are excited to use GPT‑5, see our enterprise blog⁠.

Coding

GPT‑5 is the strongest coding model we’ve ever released. It outperforms o3 across coding benchmarks and real-world use cases, and has been fine-tuned to shine in agentic coding products like Cursor, Windsurf, GitHub Copilot, and Codex CLI. GPT‑5 impressed our alpha testers, setting records on many of their private internal evals.

Early feedback on GPT‑5 for real-world coding tasks

> “GPT-5 is the smartest coding model we've used. Our team has found GPT-5 to be remarkably intelligent, easy to steer, and even to have a personality we haven’t seen in any other model. It not only catches tricky, deeply-hidden bugs but can also run long, multi-turn background agents to see complex tasks through to the finish—the kinds of problems that used to leave other models stuck. It’s become our daily driver for everything from scoping and planning PRs to completing end-to-end builds.”

Michael Truell, Co-Founder & CEO at Cursor

On SWE-bench Verified, an evaluation based on real-world software engineering tasks, GPT‑5 scores 74.9%, up from o3’s 69.1%. Notably, GPT‑5 achieves its high score with greater efficiency and speed: relative to o3 at high reasoning effort, GPT‑5 uses 22% fewer output tokens and 45% fewer tool calls.

In SWE-bench Verified⁠, a model is given a code repository and issue description, and must generate a patch to solve the issue. Text labels indicate the reasoning effort. Our scores omit 23 of 500 problems whose solutions did not reliably pass on our infrastructure. GPT‑5 was given a short prompt that emphasized verifying solutions thoroughly; the same prompt did not benefit o3.

On Aider polyglot, an evaluation of code editing, GPT‑5 sets a new record of 88%, a one-third reduction in error rate compared to o3.

In Aider polygot⁠(diff), a model is given a coding exercise from Exercism and must write its solution as a code diff. Reasoning models were run with high reasoning effort.

We’ve also found GPT‑5 to be excellent at digging deep into codebases to answer questions about how various pieces work or interoperate. In a codebase as complicated as OpenAI’s reinforcement learning stack, we’re finding that GPT‑5 can help us reason about and answer questions about our code, accelerating our own day-to-day work.

Frontend engineering

When producing frontend code for web apps, GPT‑5 is more aesthetically-minded, ambitious, and accurate. In side-by-side comparisons with o3, GPT‑5 was preferred by our testers 70% of the time.

Here are some fun, cherry-picked examples of what GPT‑5 can do with a single prompt:

Prompt: `Please generate a beautiful, realistic landing page for a service that provides the ultimate coffee enthusiast a $200/month subscription that provides equipment rental and coaching for coffee roasting and…

Excerpt shown — open the source for the full document.

Notability

notability 10.0/10

Major flagship model release