Inside OpenAI’s in-house data agent
Captured source
source ↗Inside OpenAI’s in-house data agent | OpenAI
January 29, 2026
Inside OpenAI’s in-house data agent
By Bonnie Xu, Aravind Suresh, and Emma Tang
Loading…
Share
Data powers how systems learn, products evolve, and how companies make choices. But getting answers quickly, correctly, and with the right context is often harder than it should be. To make this easier as OpenAI scales, we built our own bespoke in-house AI data agent that explores and reasons over our own platform.
Our agent is a custom internal-only tool (not an external offering), built specifically around OpenAI’s data, permissions, and workflows. We’re showing how we built and use it to help surface examples of the real, impactful ways AI can support day-to-day work across our teams. The OpenAI tools we used to build and run it (Codex, our GPT‑5 flagship model, the Evals API, and the Embeddings API) are the same tools we make available to developers everywhere.
Our data agent lets employees go from question to insight in minutes, not days. This lowers the bar to pulling data and nuanced analysis across all functions, not just by our data team. Today, teams across Engineering, Data Science, Go-To-Market, Finance, and Research at OpenAI lean on the agent to answer high-impact data questions. For example, it can help answer how to evaluate launches and understand business health, all through the intuitive format of natural language. The agent combines Codex-powered table-level knowledge with product and organizational context. Its continuously learning memory system means it also improves with every turn.
In this post, we’ll break down why we needed a bespoke AI data agent, what makes its code-enriched data context and self-learning so useful, and lessons we learned along the way.
Why we needed a custom tool
OpenAI’s data platform serves more than 3.5k internal users working across Engineering, Product, and Research, spanning over 600 petabytes of data across 70k datasets. At that size, simply finding the right table can be one of the most time-consuming parts of doing analysis.
As one internal user put it:
“We have a lot of tables that are fairly similar, and I spend tons of time trying to figure out how they’re different and which to use. Some include logged-out users, some don’t. Some have overlapping fields; it’s hard to tell what is what.”
Even with the correct tables selected, producing correct results can be challenging. Analysts must reason about table data and table relationships to ensure transformations and filters are applied correctly. Common failure modes—many-to-many joins, filter pushdown errors, and unhandled nulls—can silently invalidate results. At OpenAI’s scale, analysts should not have to sink time into debugging SQL semantics or query performance: their focus should be on defining metrics, validating assumptions, and making data-driven decisions.
This SQL statement is 180+ lines long. It’s not easy to know if we’re joining the right tables and querying the right columns.
How it works
Let’s walk through what our agent is, how it curates context, and how it keeps self-improving.
Our agent is powered by GPT‑5.2 and is designed to reason over OpenAI’s data platform. It’s available wherever employees already work: as a Slack agent, through a web interface, inside IDEs, in the Codex CLI via MCP, and directly in OpenAI’s internal ChatGPT app through a MCP connector.
Users can ask complex, open-ended questions which would typically require multiple rounds of manual exploration. Take this example prompt, which uses a test data set: “For NYC taxi trips, which pickup-to-dropoff ZIP pairs are the most unreliable, with the largest gap between typical and worst-case travel times, and when does that variability occur?”
The agent handles the analysis end-to-end, from understanding the question to exploring the data, running queries, and synthesizing findings.
The agent's response to the question.
One of the agent’s superpowers is how it reasons through problems. Rather than following a fixed script, the agent evaluates its own progress. If an intermediate result looks wrong (e.g., if it has zero rows due to an incorrect join or filter), the agent investigates what went wrong, adjusts its approach, and tries again. Throughout this process, it retains full context, and carries learnings forward between steps. This closed-loop, self-learning process shifts iteration from the user into the agent itself, enabling faster results and consistently higher-quality analyses than manual workflows.
The agent’s reasoning to identify the most unreliable NYC taxi pickup–dropoff pairs.
The agent covers the full analytics workflow: discovering data, running SQL, and publishing notebooks and reports. It understands internal company knowledge, can web search for external information, and improves over time through learned usage and memory.
Context is everything
High-quality answers depend on rich, accurate context. Without context, even strong models can produce wrong results, such as vastly misestimating user counts or misinterpreting internal terminology.
The agent without memory, unable to query effectively.
The agent’s memory enables faster queries by locating the correct tables.
To avoid these failure modes, the agent is built around multiple layers of context that ground it in OpenAI’s data and institutional knowledge.
Layer #1: Table Usage
- Metadata grounding: The agent relies on schema metadata (column names and data types) to inform SQL writing and uses table lineage (e.g., upstream and downstream table relationships) to provide context on how different tables relate.
- Query inference: Ingesting historical queries helps the agent understand how to write its own queries and which tables are typically joined together.
Layer #2: Human Annotations
- Curated descriptions of tables and columns provided by domain experts, capturing intent, semantics, business meaning, and known caveats that are not easily inferred from schemas or past queries.
Metadata alone isn’t enough. To really tell tables apart, you need to understand how they were created and where they originate.
Layer #3: Codex Enrichment
By deriving a code-level definition of a table, the agent builds a deeper understanding of what the data actually contains.
- Nuances on what is stored in the table and how it is derived from an analytics event provides extra information. For example, it can give context…
Excerpt shown — open the source for the full document.
Notability
notability 5.0/10Substantive post but not a major model release.