Announcing Lakebase Search: agent-native retrieval built into Lakebase Postgres
Captured source
source ↗Announcing Lakebase Search: agent-native retrieval built into Lakebase Postgres | Databricks Blog Skip to main content
Today, we're introducing Lakebase Search: hybrid vector and full-text retrieval built into Lakebase, available now in beta on AWS and Azure. Powered by two native Postgres extensions, lakebase_vector and lakebase_text, it allows your entire agent loop to rely on a single data backend, a lakebase. This brings next-level scale, next-level economics, and agent-first ergonomics. Agents transform search into an operational workflow: they retrieve context, reason, act, and remember. This deeply connects the read path (retrieval) with the write path (memory), making instant retrieval essential to access freshly generated insights in real time. Until now, that loop had no Postgres-native home built for the scale and economics that search at scale demands. For agents, search is actually an operational workload Agents now operate 4x more databases on Lakebase than human users do, and their primary requirement is entirely different from a human's. Traditional search engines assume a read-only snapshot of stale data. Agents, however, treat search like a live operational database. Look at a typical agent schema: chunked documents and embeddings live directly alongside an active conversational memory log. This creates a continuous read/write loop. Agents write new learnings to memory on one turn, and need that exact data fully indexed and searchable on the next. They don't just need fast retrieval; they need instant search on the absolute latest writes. Search is a strange workload Search is a unique workload with two defining properties. First, you store vastly more data than you actually query, leaving the majority of it cold. Second, vector search causes severe data bloat. A 1 KB text file expands when vectorized. This is because the document is split into multiple chunks, with each chunk generating a distinct high-dimensional embedding—even before accounting for index overhead. When multiplied across thousands of mostly idle tenants, traditional search architectures break down. Industry-standard vector indexes like HNSW are fundamentally memory-bound. Because fast graph traversal relies heavily on the index remaining resident in RAM, hosting cold multi-tenant data is expensive. Search needs a lakebase Last year, we introduced Lakebase: a serverless Postgres OLTP architecture where data lives in cheap cloud object storage, but a tiered cache (RAM, local NVMe, pageserver) ensures hot pages read at local-disk latency. We realized this is the exact architecture modern search needs. But there was a catch: to actually unlock these economics without destroying query speed, you need an index layout explicitly designed to live in a tiered storage hierarchy. Lakebase didn't have one. So, we built it. By pairing a tiered architecture with a purpose-built tiered index, we achieve: Next-level scale without the speed penalty: By intelligently fetching only the required pages from object storage into a local cache , smaller Postgres instances achieve the same recall and latency without requiring massive compute resources. Next-level economics: The cold tail of vectors sits in nearly-free object storage, while the hot working set lives on NVMe. You pay for what you query, not what you store.
The economics are easiest to see as a table. Per terabyte per month, at cloud list prices: Where the data lives Cost
RAM ~$3,000 / TB / month
Local NVMe (cache) ~$100 / TB / month
Object storage ~$20 / TB / month
Our indexing method lets Lakebase keep only the active working set in RAM. The cold majority rests in object storage, making the system two orders of magnitude cheaper—while delivering the high-performance search your application actually requires. Bringing lake-native search indexes to Postgres. When building Lakebase Search, we centered on two non-negotiable properties. When building Lakebase Search, we had two strict requirements: it had to be 100% Postgres-native (reusing standard pgvector/tsvector types and ecosystem tools), and the indexing had to be built from the ground up for tiered cloud object storage. To achieve this, we are launching two new Postgres extensions in Beta today. Both share the same goal: deliver state-of-the-art search relevance without forcing you to over-provision RAM. lakebase_vector: 32x compression and 1B+ scale.
We retained standard pgvector data types and operators but changed the underlying index type. Because the data remains in native pgvector format, it maintains compatibility and can be exported to other systems. By clustering and compressing vectors using RaBitQ (Randomized Binary Quantization), we shrink the index footprint 32x while maintaining high recall. A 100-million-vector index that previously required 300GB of RAM fits into under 10GB. This reduced memory footprint allows a single index to scale to over 1 billion vectors. The active working set is cached on local NVMe, while the cold tail resides in object storage. lakebase_text: True BM25 without the GIN memory bloat.
Postgres handles exact keyword matching via GIN indexes, which must remain resident in RAM to maintain performance. This architecture causes memory costs to scale linearly with dataset size. lakebase_text replaces GIN with an index optimized for sequential reads from cloud object storage. It introduces native BM25 relevance ranking to Postgres without the associated RAM footprint. Because both extensions execute within the same engine, hybrid search runs in a single SQL query. Vector similarity and keyword relevance are combined via reciprocal rank fusion (RRF), allowing results to be joined and filtered against operational tables. Postgres is ready for large-scale, serious search workloads We benchmarked Lakebase Search on LAION-100M—100 million 768-dimensional vectors, top-10 retrieval, on a single instance. Query performance with a warm cache and a single connection delivers exact nearest neighbor recall with zero bloat: Recall@10 P99 latency QPS
0.955 30 ms 51
0.942 18 ms 104
0.926 14 ms 142
Achieving this scale traditionally requires a memory-bound architecture. A standard pgvector HNSW index requires the neighbor graph and its target heap pages to remain resident in RAM for performant traversal. At 100 million vectors: pgvector: Requires a 512 GB (64 CPU) instance. Index build takes ~40 hours. Because graph traversal relies on...
Excerpt shown — open the source for the full document.
Notability
notability 6.0/10Substantive product feature release by Databricks