Scaling PostgreSQL to power 800 million ChatGPT users
Captured source
source ↗Scaling PostgreSQL to power 800 million ChatGPT users | OpenAI
January 22, 2026
Scaling PostgreSQL to power 800 million ChatGPT users
By Bohan Zhang, Member of the Technical Staff
Loading…
Share
For years, PostgreSQL has been one of the most critical, under-the-hood data systems powering core products like ChatGPT and OpenAI’s API. As our user base grows rapidly, the demands on our databases have increased exponentially, too. Over the past year, our PostgreSQL load has grown by more than 10x, and it continues to rise quickly.
Our efforts to advance our production infrastructure to sustain this growth revealed a new insight: PostgreSQL can be scaled to reliably support much larger read-heavy workloads than many previously thought possible. The system (initially created by a team of scientists at University of California, Berkeley) has enabled us to support massive global traffic with a single primary Azure PostgreSQL flexible server instance and nearly 50 read replicas spread over multiple regions globally. This is the story of how we’ve scaled PostgreSQL at OpenAI to support millions of queries per second for 800 million users through rigorous optimizations and solid engineering; we’ll also cover key takeaways we learned along the way.
Cracks in our initial design
After the launch of ChatGPT, traffic grew at an unprecedented rate. To support it, we rapidly implemented extensive optimizations at both the application and PostgreSQL database layers, scaled up by increasing the instance size, and scaled out by adding more read replicas. This architecture has served us well for a long time. With ongoing improvements, it continues to provide ample runway for future growth.
It may sound surprising that a single-primary architecture can meet the demands of OpenAI’s scale; however, making this work in practice isn’t simple. We’ve seen several SEVs caused by Postgres overload, and they often follow the same pattern: an upstream issue causes a sudden spike in database load, such as widespread cache misses from a caching-layer failure, a surge of expensive multi-way joins saturating CPU, or a write storm from a new feature launch. As resource utilization climbs, query latency rises and requests begin to time out. Retries then further amplify the load, triggering a vicious cycle with the potential to degrade the entire ChatGPT and API services.
Although PostgreSQL scales well for our read-heavy workloads, we still encounter challenges during periods of high write traffic. This is largely due to PostgreSQL’s multiversion concurrency control (MVCC) implementation, which makes it less efficient for write-heavy workloads. For example, when a query updates a tuple or even a single field, the entire row is copied to create a new version. Under heavy write loads, this results in significant write amplification. It also increases read amplification, since queries must scan through multiple tuple versions (dead tuples) to retrieve the latest one. MVCC introduces additional challenges such as table and index bloat, increased index maintenance overhead, and complex autovacuum tuning. (You can find a deep-dive on these issues in a blog I wrote with Prof. Andy Pavlo at Carnegie Mellon University called The Part of PostgreSQL We Hate the Most, cited in the PostgreSQL Wikipedia page.)
Scaling PostgreSQL to millions of QPS
To mitigate these limitations and reduce write pressure, we’ve migrated, and continue to migrate, shardable (i.e. workloads that can be horizontally partitioned), write-heavy workloads to sharded systems such as Azure Cosmos DB, optimizing application logic to minimize unnecessary writes. We also no longer allow adding new tables to the current PostgreSQL deployment. New workloads default to the sharded systems.
Even as our infrastructure has evolved, PostgreSQL has remained unsharded, with a single primary instance serving all writes. The primary rationale is that sharding existing application workloads would be highly complex and time-consuming, requiring changes to hundreds of application endpoints and potentially taking months or even years. Since our workloads are primarily read-heavy, and we’ve implemented extensive optimizations, the current architecture still provides ample headroom to support continued traffic growth. While we’re not ruling out sharding PostgreSQL in the future, it’s not a near-term priority given the sufficient runway we have for current and future growth.
In the following sections, we’ll dive into the challenges we faced and the extensive optimizations we implemented to address them and prevent future outages, pushing PostgreSQL to its limits and scaling it to millions of queries per second (QPS).
Reducing load on the primary
Challenge: With only one writer, a single-primary setup can’t scale writes. Heavy write spikes can quickly overload the primary and impact services like ChatGPT and our API.
Solution: We minimize load on the primary as much as possible—both reads and writes—to ensure it has sufficient capacity to handle write spikes. Read traffic is offloaded to replicas wherever possible. However, some read queries must remain on the primary because they’re part of write transactions. For those, we focus on ensuring they’re efficient and avoid slow queries. For write traffic, we’ve migrated shardable, write-heavy workloads to sharded systems such as Azure CosmosDB. Workloads that are harder to shard but still generate high write volume take longer to migrate, and that process is still ongoing. We also aggressively optimized our applications to reduce write load; for example, we’ve fixed application bugs that caused redundant writes and introduced lazy writes, where appropriate, to smooth traffic spikes. In addition, when backfilling table fields, we enforce strict rate limits to prevent excessive write pressure.
Query optimization
Challenge: We identified several expensive queries in PostgreSQL. In the past, sudden volume spikes in these queries would consume large amounts of CPU, slowing both ChatGPT and API requests.
Solution: A few expensive queries, such as those joining many tables together, can significantly degrade or even bring down the entire service. We need to continuously optimize PostgreSQL queries to ensure they’re efficient and avoid common Online Transaction Processing (OLTP) anti-patterns. For example, we once identified an extremely costly query that joined 12 tables, where spikes in this query…
Excerpt shown — open the source for the full document.
Notability
notability 7.0/10Notable technical post, strong community traction.