WritingOpenAIOpenAIpublished Dec 12, 2025seen 6d

How We Used Codex to Ship Sora for Android in 28 Days

Open original ↗

Captured source

source ↗

How we used Codex to build Sora for Android in 28 days | OpenAI

December 12, 2025

How we used Codex to build Sora for Android in 28 days

By Patrick Hum and RJ Marsan, Members of the Technical Staff

Loading…

Share

As of April 26, 2026, the Sora product is no longer available.

---

In November, we launched the Sora Android app to the world, giving anyone with an Android device the ability to turn a short prompt into a vivid video. On launch day, the app reached #1 in the Play Store. Android users generated more than a million videos in the first 24 hours.

Behind the launch is a story: the initial version of Sora’s production Android app was built in 28 days, thanks to the same agent that’s available to any team or developer: Codex.

From October 8 to November 5, 2025, a lean engineering team working alongside Codex and consuming roughly 5 billion tokens, shipped Sora for Android from prototype to global launch. Despite its scale, the app has a crash-free rate of 99.9 percent and an architecture we’re proud of. If you’re wondering whether we used a secret model, we used an early version of the GPT‑5.1‑Codex model – the same version that any developer or business can use today via CLI, IDE extension, or web app.

Prompt: figure skater performs a triple axle with a cat on her head

Embracing Brooks’ Law: Staying nimble to move fast

When Sora launched on iOS, usage exploded. People immediately began generating a stream of videos. On Android, by contrast, we had only a small internal prototype and a mounting number of pre-registered users on Google Play.

A common response to a high stakes, time-pressured launch is to pile on resources and add process. A production app of this scope and quality would typically involve many engineers working for months, slowed down by coordination.

American computer architect Fred Brooks famously warned that “adding more people to a late software project makes it later.” In other words, when trying to ship a complex project quickly, adding more engineers can often slow down efficiency by adding to communication overhead, task fragmentation, and integration costs. We leaned into this insight instead of ignoring it; we assembled a strong team of four engineers – all equipped with Codex to drastically increase each engineer’s impact.

Working this way, we shipped an internal build of Sora for Android to employees in 18 days and launched publicly 10 days later. We maintained a high bar on Android engineering practices, invested in maintainability, and held the app to the same reliability bar we would expect from a more traditional project. (We also continue to use Codex extensively today to evolve and bring new features to the app).

Onboarding a new senior engineer

To make sense of how we worked with Codex, it helps to know where it shines and where it needs direction. Treating it like a newly hired senior engineer was a good approach. Codex’s ability meant we could spend more time directing and reviewing code than writing it ourselves.

Where Codex needs guidance

1. Codex isn’t yet great at inferring what it hasn’t been told (e.g., your preferred architecture patterns, product strategy, real user behavior, and internal norms or shortcuts). 2. Similarly, Codex couldn’t see the app actually run: It couldn’t open Sora on a device, notice that a scroll felt off, or sense that a flow was confusing. Only our team could cover these experiential tasks. 3. Each instance requires onboarding. Sharing context with clear goals, constraints, and guidance on “how we do things” was essential to making Codex execute well. 4. In the same vein, Codex struggled with deep architectural judgment: Left on its own, it might introduce an extra view model where we really wanted to extend an existing one or push logic into the UI layer that clearly belonged in a repository. Its instinct is to get something working, not to prioritize long‑term cleanliness.

We found it useful to have Codex create and maintain a generous amount of AGENT.md files throughout the codebase. This made it easy to apply the same guidance and best practices across sessions. For example, to ensure Codex wrote code in our style guidelines, we added the following to our top-level AGENTS.md:

Plain Text

1## Formatting and static checks2- **Always run** `./gradlew detektFix` (or for the affected modules) **before committing**. CI will fail if formatting or detekt issues are present.

Where Codex excels

1. Reading and understanding large codebases rapidly: Codex knows essentially all major programming languages, which makes it easier to leverage the same concepts across many platforms without complex abstractions. 2. Testing coverage: Codex is (uniquely) enthusiastic about writing unit tests to cover a broad variety of cases. Not every test was deep, but having breadth of coverage was helpful in preventing regressions. 3. Applying feedback: In a similar vein, Codex is good at reacting to feedback. When CI failed, we could paste log output into a prompt and ask Codex to propose fixes. 4. Massively parallel, disposable execution: Most won’t push the limits of the number of sessions they could actually run at any one time. It’s highly feasible to test multiple ideas in parallel and view code as disposable. 5. Offering new perspective: In design discussions, we used Codex as a generative tool to explore potential failure points and discover new ways to solve a problem. For example, while we designed video player memory optimizations, Codex sifted through multiple SDKs to propose approaches we wouldn’t have had time to parse. The insights from Codex’s research proved invaluable in minimizing memory footprint in the final app. 6. Enabling higher‑leverage work: In practice, we ended up spending more time reviewing and directing code than writing it ourselves. That said, Codex is very good at code review, too, often catching bugs before they’re merged, improving reliability.

Once we acknowledged these characteristics, our working model became more straightforward. We leaned on Codex to do a huge amount of heavy lifting inside well‑understood patterns and well‑bounded scopes, while our team focused on architecture, user experience, systemic changes, and final quality.

Laying the foundation by hand

Even the best new, senior hire doesn’t have the right vantage point for making long-term trade-offs right away. To leverage Codex and ensure its work was robust and maintainable,…

Excerpt shown — open the source for the full document.

Notability

notability 4.0/10

Low traction, routine blog post