Introducing Codex
Captured source
source ↗Introducing Codex | OpenAI
May 16, 2025
Introducing Codex
A cloud-based software engineering agent that can work on many tasks in parallel, powered by codex-1. Available to ChatGPT Pro, Business, and Enterprise users today, and Plus users soon.
Loading…
Update on June 3, 2025: Codex is now available to ChatGPT Plus users. We’re also enabling users to provide Codex with internet access during task execution. Please refer to the changelog and docs for more details.
---
Today we’re launching a research preview of Codex: a cloud-based software engineering agent that can work on many tasks in parallel. Codex can perform tasks for you such as writing features, answering questions about your codebase, fixing bugs, and proposing pull requests for review; each task runs in its own cloud sandbox environment, preloaded with your repository.
Codex is powered by codex-1, a version of OpenAI o3 optimized for software engineering. It was trained using reinforcement learning on real-world coding tasks in a variety of environments to generate code that closely mirrors human style and PR preferences, adheres precisely to instructions, and can iteratively run tests until it receives a passing result. We’re starting to roll out Codex to ChatGPT Pro, Enterprise, and Business users today, with support for Plus and Edu coming soon.
How Codex works
Today you can access Codex through the sidebar in ChatGPT and assign it new coding tasks by typing a prompt and clicking “Code”. If you want to ask Codex a question about your codebase, click “Ask”. Each task is processed independently in a separate, isolated environment preloaded with your codebase. Codex can read and edit files, as well as run commands including test harnesses, linters, and type checkers. Task completion typically takes between 1 and 30 minutes, depending on complexity, and you can monitor Codex’s progress in real time.
Once Codex completes a task, it commits its changes in its environment. Codex provides verifiable evidence of its actions through citations of terminal logs and test outputs, allowing you to trace each step taken during task completion. You can then review the results, request further revisions, open a GitHub pull request, or directly integrate the changes into your local environment. In the product, you can configure the Codex environment to match your real development environment as closely as possible.
Codex can be guided by AGENTS.md files placed within your repository. These are text files, akin to README.md, where you can inform Codex how to navigate your codebase, which commands to run for testing, and how best to adhere to your project's standard practices. Like human developers, Codex agents perform best when provided with configured dev environments, reliable testing setups, and clear documentation.
On coding evaluations and internal benchmarks, codex-1 shows strong performance even without AGENTS.md files or custom scaffolding.
23 SWE-Bench Verified samples that were not runnable on our internal infrastructure were excluded. codex-1 was tested at a maximum context length of 192k tokens and medium ‘reasoning effort’, which is the setting that will be available in product today. For details on o3 evaluations, see here. Our internal SWE task benchmark is a curated set of real-world internal SWE tasks at OpenAI.
Building safe and trustworthy agents
We're releasing Codex as a research preview, in line with our iterative deployment strategy. We prioritized security and transparency when designing Codex so users can verify its outputs - a safeguard that grows increasingly more important as AI models handle more complex coding tasks independently and safety considerations evolve. Users can check Codex’s work through citations, terminal logs and test results. When uncertain or faced with test failures, the Codex agent explicitly communicates these issues, enabling users to make informed decisions about how to proceed. It still remains essential for users to manually review and validate all agent-generated code before integration and execution.
Aligning to human preferences
A primary goal while training codex-1 was to align outputs closely with human coding preferences and standards. Compared to OpenAI o3, codex-1 consistently produces cleaner patches ready for immediate human review and integration into standard workflows.
Please fix the following issue in the astropy/astropy repository. Please resolve the issue in the problem below by editing and testing code files in your current code execution session. The repository is cloned in the /testbed folder. You must fully solve the problem for your answer to be considered correct. Problem statement:Modeling's separability_matrix does not compute separability correctly for nested CompoundModels Consider the following model: ``python from astropy.modeling import models as m from astropy.modeling.separable import separability_matrix cm = m.Linear1D(10) & m.Linear1D(5) It's separability matrix as you might expect is a diagonal: python >>> separability_matrix(cm) array([[ True, False], [False, True]]) If I make the model more complex: python >>> separability_matrix(m.Pix2Sky_TAN() & m.Linear1D(10) & m.Linear1D(5)) array([[ True, True, False, False], [ True, True, False, False], [False, False, True, False], [False, False, False, True]]) The output matrix is again, as expected, the outputs and inputs to the linear models are separable and independent of each other. If however, I nest these compound models: python >>> separability_matrix(m.Pix2Sky_TAN() & cm) array([[ True, True, False, False], [ True, True, False, False], [False, False, True, True], [False, False, True, True]]) `` Suddenly the inputs and outputs are no longer separable? This feels like a bug to me, but I might be missing something?
Expand
Codex
OpenAI o3
Preventing abuse
Safeguarding against malicious applications of AI-driven software engineering, such as malware development, is increasingly critical. At the same time, it’s important that protective measures do not unduly hinder legitimate and beneficial applications that may involve techniques sometimes also used for malware development, such as low level kernel engineering.
To balance safety and utility, Codex was trained to identify and precisely refuse requests aimed at development of malicious software, while clearly distinguishing and supporting legitimate tasks. We've also enhanced our…
Excerpt shown — open the source for the full document.
Notability
notability 9.0/10Major model launch with high traction.