Introducing GPT-5.1 for developers
Captured source
source ↗Introducing GPT-5.1 for developers | OpenAI
November 13, 2025
Introducing GPT‑5.1 for developers
Loading…
Share
Today we’re releasing GPT‑5.1 in the API platform, the next model in the GPT‑5 series that balances intelligence and speed for a wide range of agentic and coding tasks. GPT‑5.1 dynamically adapts how much time it spends thinking based on the complexity of the task, making the model significantly faster and more token-efficient on simpler everyday tasks. The model also features a “no reasoning” mode to respond faster on tasks that don’t require deep thinking, while maintaining the frontier intelligence of GPT‑5.1.
To make GPT‑5.1 even more efficient, we’re releasing extended prompt caching for up to 24 hour cache retention, driving faster responses for follow-up questions at a lower cost. Our Priority Processing customers will also experience noticeably faster performance with GPT‑5.1 over GPT‑5.
On coding, we’ve worked closely with startups like Cursor, Cognition, Augment Code, Factory, and Warp to improve GPT‑5.1’s coding personality, steerability, and code quality. In general, GPT‑5.1 feels more intuitive to use for coding and more communicative with user-facing updates as it completes tasks.
Finally, we’re introducing two new tools with GPT‑5.1: anapply_patch tool designed to edit code more reliably and a shell tool to let the model run shell commands.
GPT‑5.1 is the next advancement in the GPT‑5 series, and we plan to continue to invest in more intelligent and capable models to help developers build reliable agentic workflows.
Efficient reasoning across tasks
Adaptive reasoning
To make GPT‑5.1 faster, we overhauled the way we trained it to think. On straightforward tasks, GPT‑5.1 spends fewer tokens thinking, enabling snappier product experiences and lower token bills. On difficult tasks that require extra thinking, GPT‑5.1 remains persistent, exploring options and checking its work in order to maximize reliability.
Balyasny Asset Management said GPT‑5.1 "outperformed both GPT‑4.1 and GPT‑5 in our full dynamic evaluation suite, while running 2-3x faster than GPT‑5." They also said across their tool-heavy reasoning tasks, GPT‑5.1 “consistently used about half as many tokens as leading competitors at similar or better quality.” Similarly, AI insurance BPO Pace also tested the model and said their agents run "50% faster on GPT‑5.1 while exceeding accuracy of GPT‑5 and other leading models across our evals."
GPT‑5.1 varies its thinking time more dynamically than GPT‑5. On a representative distribution of ChatGPT tasks, GPT‑5.1 is much faster at the easier tasks, even at high reasoning effort.
As an example, when asked "show an npm command to list globally installed packages", GPT‑5.1 answers in 2 seconds instead of 10 seconds.
GPT-5 (Medium) takes ~250 tokens (~10 seconds)
show an npm command to list globally installed packages
npm list -g --depth=0
GPT-5.1 (Medium) takes ~50 tokens (~2 seconds)
show an npm command to list globally installed packages
You can list globally installed npm packages with:
- Just package names: npm ls -g --depth=0
- Including their install paths: npm ls -g
The first one is usually what you want.
New “no reasoning” mode
Developers can now use GPT‑5.1 without reasoning by setting reasoning_effort to 'none'. This makes the model behave like a non-reasoning model for latency-sensitive use cases, with the high intelligence of GPT‑5.1 and added bonus of performant tool-calling. Relative to GPT‑5 with 'minimal' reasoning, GPT‑5.1 with no reasoning is better at parallel tool calling (which itself increases end-to-end task completion speed), coding tasks, following instructions, and using search tools—and supports web search in our API platform. Sierra shared that GPT‑5.1 on “no reasoning” mode showed a “20% improvement on low-latency tool calling performance compared to GPT‑5 minimal reasoning” in their real-world evals.
With the introduction of 'none' as a value in reasoning_effort, developers now have even more flexibility and control over the balance between speed, cost, and intelligence for their use case. GPT‑5.1 defaults to 'none', which is ideal for latency-sensitive workloads. We recommend developers choose 'low' or 'medium' for tasks of higher complexity and 'high' when intelligence and reliability matter more than speed.
Extended prompt caching
Extended caching improves reasoning efficiency by allowing prompts to remain active in the cache for up to 24 hours, rather than the few minutes supported today. With a longer retention window, more follow-up requests can leverage cached context—resulting in lower latency, reduced cost, and smoother performance for long-running interactions such as multi-turn chat, coding sessions, or knowledge retrieval workflows.
Prompt cache pricing remains unchanged, with cached input tokens 90% cheaper than uncached tokens, and no additional charge for cache writes or storage. To use extended caching with GPT‑5.1, add the parameter“prompt_cache_retention='24h'” on the Responses or Chat Completions API. See the prompt caching docs for more detail.
Coding
GPT‑5.1 builds on GPT‑5’s coding capabilities with a more steerable coding personality, less overthinking, improved code quality, better user-targeted update messages (preambles) during sequences of tool calls, and more functional frontend designs—especially at low reasoning effort.
On simpler coding tasks like quick code edits, GPT‑5.1’s faster speeds make it easier to iterate back and forth. GPT‑5.1’s faster speeds on simple tasks don’t degrade performance on difficult tasks. On SWE-bench Verified, GPT‑5.1 works even longer than GPT‑5 and reaches 76.3%.
In SWE-bench Verified, a model is given a code repository and issue description, and must generate a patch to solve the issue. Labels indicate reasoning effort. Accuracy is averaged across all 500 problems. All models used a harness with JSON-based apply_patch tool.
We got early feedback on GPT‑5.1 from a handful of coding companies. Here are their impressions:
- Augment Code called GPT‑5.1 “more deliberate with fewer wasted actions, more efficient reasoning, and better task focus” and they’re seeing “more accurate changes, smoother pull requests, and faster iteration across multi-file projects.”
- Cline shared that in their evals, “GPT‑5.1 achieved SOTA on our diff editing benchmark with a 7% improvement, demonstrating exceptional…
Excerpt shown — open the source for the full document.
Notability
notability 8.0/10New model from OpenAI, HN traction moderate