How Stagwell built privacy-safe ID matching on Databricks
Captured source
source ↗How Stagwell built privacy-safe ID matching on Databricks | Databricks Blog Skip to main content
Summary
Brands struggle to securely match fragmented first-party data with identity graphs without exposing sensitive information.
Databricks Marketplace-powered clean room apps enable plug-and-play, privacy-safe identity matching at scale ensuring data never leaves the customer's environment.
Stagwell’s solution combines Databricks Clean Rooms, Stagwell ID Spine, and app orchestration to move from raw data to actionable audiences via their Agentic Targeting System (SATS), all without exposing raw records from either side.
The identity matching problem brands face today Brands invest heavily in building first-party data assets, including purchase histories, CRM records, loyalty programs,and website interactions. That data is fragmented across systems and difficult to activate across channels. However, first-party data alone only tells part of the story. To build complete audience profiles, brands need to match their records against identity providers' spines for cross-channel identity graphs spanning email, device IDs, cookies, and offline touchpoints. The traditional approach is painful. Brands export customer records to a third-party platform, the identity provider runs their matching algorithms, and results come back days later. Every step introduces risk: data leaves the brand's secure environment, PII travels across networks, and compliance teams must review data-sharing agreements that can take weeks to negotiate. At the same time, privacy regulations and platform restrictions have made: Third-party cookies unreliable Data sharing risky Identity stitching more complex
This creates a fundamental gap: Brands have data but lack the ability to connect it to a unified identity layer safely To bridge this, brands need to: Match their data against a comprehensive identity graph Enrich it with additional signals and attributes Do so while protecting raw user-level data
The Marketing Cloud, a Global Marketing Services Agency, a Stagwell company, experienced this friction firsthand across their brand clients. They pushed for a better model: one where brands could access Stagwell's identity matching capabilities without ever sending their raw data outside their own infrastructure. How Marketplace Apps change the distribution model Traditional clean room implementations are high-touch, engineering-heavy, and can be slow to deploy. Databricks Marketplace Apps flip the traditional data-sharing model. Instead of "send us your data and we will process it," the model becomes "install our app and it runs where your data already lives”. Brands can now install a pre-built application, connect their data, and run identity matching workflows instantly. When an application is published to the Databricks Marketplace, any brand with a Databricks workspace can request access and install it directly. The app runs inside the brand's own environment with its own auto-provisioned service principal. The brand's data never crosses a network boundary. This is a fundamental shift for data providers. Previously, distributing proprietary algorithms meant either exposing source code (which partners will not do) or requiring brands to export data (which compliance teams resist). Marketplace Apps solve both problems: the app's code is containerized and opaque to the consumer, while the brand's data stays in their Unity Catalog . With marketplace distribution, deployment time drops from months to minutes, standardized workflows improve usability, and governance is baked into the platform. Stagwell was among the first partners to put this model into production. What Stagwell built and how it works Stagwell built a marketplace-ready clean room application on Databricks that enables secure ingestion of brand first-party data, matching against the Stagwell Identity Spine, privacy-safe insights generation, and seamless transition to audience creation and activation. At its core, the system combines Databricks Clean Rooms for secure collaboration, Unity Catalog for governance and access control, Jobs and Notebooks for identity matching execution, and a React and Express app layer for user experience.
Here’s how the end-to-end flow works. Step 1: Install and authenticate An administrator on the brand side discovers Stagwell's app in the Databricks Marketplace and installs it into their workspace. During installation, the admin need to authorize and bind the app to resources it needs: a SQL warehouse for queries and any secrets for configuration. The app receives an auto-provisioned service principal with credentials injected as environment variables. No manual credential setup is required.
Step 2: Connect brand data When a brand user opens the app, they authenticate through their workspace's standard OAuth flow. The app uses On-Behalf-Of (OBO) authorization to access the brand's data with the logged-in user's identity. This means every Unity Catalog ACL, row filter, and column mask is enforced automatically. The app sees exactly what that user is authorized to see - nothing more.
Step 3: Initiate the clean room match The brand user selects which first-party tables to match and triggers the process. Behind the scenes, the app calls Stagwell's backend to create a Packaged Clean Room . Stagwell contributes their Identity Spine data and a matching notebook, then designates the brand as the runner. The "packaged" designation is key: it eliminates the approval workflow that standard clean rooms require. The brand can execute the matching notebook immediately. And critically, the brand can see the notebook's name but not its source code - protecting Stagwell's proprietary matching logic.
Step 4: Run the Identity Match The brand runs the matching notebook inside the clean room which performs the following operations: Joins brand data with the ID Spine Resolves identities across multiple identifiers Computes: Match rates Coverage metrics Household and consumer IDs
The notebook reads from both parties' input catalogs and writes results to a shared output schema. Both Stagwell and the brand can see the match results via Delta Sharing . The brand's raw customer data is never visible to Stagwell. Stagwell's matching algorithms are never visible to the brand. The clean room enforces this separation at the platform level. All processing...
Excerpt shown — open the source for the full document.
Notability
notability 5.0/10Substantive technical implementation post, no major AI breakthrough.