Unlocking the Codex harness: how we built the App Server
Captured source
source ↗Unlocking the Codex harness: how we built the App Server | OpenAI
February 4, 2026
Unlocking the Codex harness: how we built the App Server
By Celia Chen, Member of the Technical Staff
Loading…
Share
OpenAI’s coding agent Codex exists across many different surfaces: the web app, the CLI, the IDE extension, and the new Codex macOS app. Under the hood, they’re all powered by the same Codex harness—the agent loop and logic that underlies all Codex experiences. The critical link between them? The Codex App Server, a client-friendly, bidirectional JSON-RPC1 API.
In this post, we’ll introduce the Codex App Server; we’ll share our learnings so far on the best ways to bring Codex’s capabilities into your product to help your users supercharge their workflows. We’ll cover the App Server’s architecture and protocol and how it integrates with different Codex surfaces, as well as tips on leveraging Codex, whether you want to turn Codex into a code reviewer, an SRE agent, or a coding assistant.
Origin of the App Server
Before diving into architecture, it’s helpful to know the App Server’s backstory. Initially, the App Server was a practical way to reuse the Codex harness across products that gradually evolved into our standard protocol.
Codex CLI started as a TUI (terminal user interface), meaning Codex is accessed through the terminal. When we built the VS Code extension (a more IDE-friendly way to interact with Codex agents), we needed a way to use the same harness so as to drive the same agent loop from an IDE UI without re-implementing it. That meant supporting rich interaction patterns beyond request/response, such as exploring the workspace, streaming progress as the agent reasons, and emitting diffs. We first experimented with exposing Codex as an MCP server, but maintaining MCP semantics in a way that made sense for VS Code proved difficult. Instead, we introduced a JSON-RPC protocol that mirrored the TUI loop, which became the unofficial first version of the App Server. At the time, we didn’t expect other clients to depend on the App Server, so it wasn’t designed as a stable API.
As Codex adoption grew over the next few months, internal teams and external partners wanted the ability to embed the same harness in their own products in order to accelerate their users’ software development workflows. For example, JetBrains and Xcode wanted an IDE-grade agent experience, while the Codex desktop app needed to orchestrate many Codex agents in parallel. Those demands pushed us to design a platform surface that both our products and partner integrations could safely depend on over time. It needed to be easy to integrate and backward compatible, meaning we could evolve the protocol without breaking existing clients.
Next, we’ll walk through how we designed the architecture and protocol so different clients can use the same harness.
Inside the Codex harness
First, let’s zoom in on what’s inside the Codex harness and how the Codex App Server exposes it to clients. In our last Codex blog, we broke down the core agent loop that orchestrates the interaction between the user, the model, and the tools. This is the core logic of the Codex harness, but there’s more to the full agent experience:
1. Thread lifecycle and persistence. A thread is a Codex conversation between a user and an agent. Codex creates, resumes, forks, and archives threads, and persists the event history so clients can reconnect and render a consistent timeline.
2. Config and auth. Codex loads configuration, manages defaults, and runs authentication flows like “Sign in with ChatGPT,” including credential state.
3. Tool execution and extensions. Codex executes shell/file tools in a sandbox and wires up integrations like MCP servers and skills so they can participate in the agent loop under a consistent policy model.
All the agent logic we mentioned here, including the core agent loop, lives in a part of the Codex CLI codebase called “ Codex core.” Codex core is both a library where all the agent code lives and a runtime that can be spun up to run the agent loop and manage the persistence of one Codex thread (conversation).
To be useful, the Codex harness needs to be accessible to clients. That’s where the App Server comes in.
The App Server is both the JSON-RPC protocol between the client and the server and a long-lived process that hosts the Codex core threads. As we can see from the diagram above, an App Server process has four main components: the stdio reader, the Codex message processor, the thread manager, and core threads. The thread manager spins up one core session for each thread, and the Codex message processor then communicates with each core session directly to submit client requests and receive updates.
One client request can result in many event updates, and these detailed events are what allow us to build a rich UI on top of the App Server. Furthermore, the stdio reader and the Codex message processor serve as the translation layer between the client and Codex core threads. They translate client JSON-RPC requests into Codex core operations, listen to Codex core’s internal event stream, and then transform those low-level events into a small set of stable, UI-ready JSON-RPC notifications.
The JSON-RPC protocol between the client and the App Server is fully bidirectional. A typical thread has a client request and many server notifications. In addition, the server can initiate requests when the agent needs input, like an approval, and then pause the turn until the client responds.
The conversation primitives
Next, we’ll break down the conversation primitives, the building blocks of the App Server protocol. Designing an API for an agent loop is tricky because the user/agent interaction is not a simple request/response. One user request can unfold into a structured sequence of actions that the client needs to represent faithfully: the user’s input, the agent’s incremental progress, artifacts produced along the way (e.g., diffs). To make that interaction stream easy to integrate and resilient across UIs, we landed on three core primitives with clear boundaries and lifecycles:
1. Item: An item is the atomic unit of input/output in Codex. Items are typed (e.g., user message, agent message, tool execution, approval request, diff) and each has an explicit lifecycle:
item/startedwhen the item begins- optional
item/*/deltaevents as content streams in (for streaming item types)…
Excerpt shown — open the source for the full document.
Notability
notability 3.0/10Low traction technical blog post