WritingDatabricks (DBRX)Databricks (DBRX)published Jun 12, 2026seen 10h

Talk to all your data, wherever it lives

Open original ↗

Captured source

source ↗
published Jun 12, 2026seen 10hcaptured 10hhttp 200method plain

Talk to all your data, wherever it lives | Databricks Blog Skip to main content

Summary

*Connect Genie to data using Lakehouse Federation, avoiding the delays of a "big bang" migration.

*Leverage Unity Catalog as the source of truth for both federated and managed data, ensuring AI workloads are secure and production-ready.

*Start querying data in natural language immediately. Optimize performance by upgrading to Unity Catalog managed tables.

Agentic AI has created demand for cross-source reasoning that didn't exist 12 months ago. Business users want to ask natural language questions such as "which marketing campaigns drove the most ROI last quarter?" and get instant insights from their data. The problem is that enterprise data is frequently spread across multiple systems such as  AWS Glue, Snowflake, Oracle, BigQuery, Postgres and sometimes locked in legacy proprietary formats where migrating everything to a single system could take months. What if you don’t have to migrate the data and could still reason over your entire data estate? With Lakehouse Federation, Databricks connects directly to your existing sources, wherever they live, and brings them under a single governance layer in Unity Catalog . Permissions, lineage, and access controls work consistently across every connected system, so you get enterprise-grade security without rebuilding it source by source. Business users can then query that unified data in plain English through Genie , getting answers that span every connected platform without a single pipeline, copy, or migration step. In this blog, we'll walk through how to set it up by connecting to an external source, syncing its metadata into Unity Catalog, and asking questions through Genie - all in minutes. How it works Lakehouse Federation allows users and AI agents to securely connect to an external source and govern it alongside your native data. This enables Genie to access your extended data estate on the fly without requiring a migration. Lakehouse Federation connects to over 20 of the most popular data platforms. As an example, let's walk through how easy it is to set up with AWS Glue. 1. Connect to your external data sources with Lakehouse Federation First, we create a connection to the external AWS Glue project. In this example, we connect to a Glue database containing marketing campaign data.

Next, we sync the data in-place to Unity Catalog. This provides access to all tables without having to copy any data, ensuring data is always up to date. It also avoids any disruption to the source system.

2. Leverage your existing metadata Raw table and column names are often meaningless to an AI model. An AI agent won't inherently know that status_code 4 means "Urgent" or that spend_amount refers to marketing costs. Many organizations have already invested in documenting their schemas in the source system — adding table descriptions, column comments, and business glossary terms directly in Glue. Lakehouse Federation now brings that context forward automatically. When you create a foreign catalog, comments and descriptions from the source system are federated into Unity Catalog alongside the table metadata. This means: Existing column descriptions (e.g., "spend_amount — total marketing spend in USD") carry over without manual re-entry Table-level comments documenting business context are preserved AI tools like Genie can immediately leverage this metadata to understand your schema

Today, we support foreign table comments on Glue and BigQuery. In preview, we have expanded support for PostgreSQL, Redshift, MySQL, Snowflake and we plan to add more sources each month ( Sign up for the preview ). 3. Define reusable semantics on top of your federated data Inherited comments tell Genie what your data is, but they can't capture how your business measures things. A column comment can explain that spend_amount is marketing cost in USD, but only a metric definition can encode that ROI is impressions divided by spend. That's business logic, and historically it has lived in scattered dashboard formulas, ad hoc SQL, and tribal knowledge, often with subtly different definitions across teams. Unity Catalog Semantics lets you define that business logic once as a governed object, so every tool that queries it gets the same trusted calculation. And because federated tables are first-class citizens in Unity Catalog, this works on data that never left its source system. You can define metrics like ROI directly on any federated source, no migration required. With Unity Catalog metrics, you define it once, directly on the federated table. The metric view defines two things: fields like campaign_id and quarter that users can group and filter by, and a measure, roi , that encodes the business formula itself. Define ROI a single time, and Genie, AI/BI dashboards, and notebooks all compute it identically. When the definition changes, you update it in one place and every consumer inherits the change. 4. Ask Genie With the data connected and contextualized, your marketing analyst can now open a Genie room and ask the question we started with: "Which marketing campaigns drove the most ROI last quarter?" Genie doesn't have to reconstruct the ROI formula from scratch  it resolves to the certified roi measure in the metric view and automatically generates the correct SQL against the federated data.

The result? An immediate, accurate answer derived from live data sitting in Glue. Genie, powered by Lakehouse Federation, is just one example of how Unity Catalog enables AI insights across your entire data estate. Whether the query comes from a business analyst in a Genie room or an agent-powered workflow, Unity Catalog provides the governed, contextualized foundation that makes it work. What's next We're continuing to invest in making Lakehouse Federation the fastest on-ramp to the Databricks Platform: Richer business semantics for federated tables: Beyond importing existing comments, we're building new ways to augment your federated metadata with AI-powered descriptions and business context — making Genie even smarter out of the box. Improved performance by upgrading to managed tables : Use the  SET MANAGED feature to convert a foreign table to a Unity Catalog managed table in Databricks, and benefit from  50%+ cost savings and 20x faster queries . Federation support for more catalogs and platforms: We continue to add new federation...

Excerpt shown — open the source for the full document.

Notability

notability 7.0/10

Major AI platform announcement, but traction unknown.