RepoMicrosoftMicrosoftpublished Feb 28, 2024seen 1w

microsoft/dbt-fabricspark

Python

Open original ↗

Captured source

source ↗
published Feb 28, 2024seen 1wcaptured 1whttp 200method plain

microsoft/dbt-fabricspark

Description: A dbt adapter for Fabric Lakehouse backed by Apache Spark.

Language: Python

License: MIT

Stars: 62

Forks: 35

Open issues: 0

Created: 2024-02-28T22:49:23Z

Pushed: 2026-06-13T02:57:36Z

Default branch: main

Fork: no

Archived: no

README:

Fabric Spark - dbt

dbt adapter for Fabric Spark supporting SQL models.

dbt Docs · Fabric Lakehouse with Spark · Fabric Lakehouse Livy API

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

dbt is the T in ELT. Organize, cleanse, denormalize, filter, rename, and pre-aggregate the raw data in your warehouse so that it's ready for analysis.

dbt-fabricspark

The dbt-fabricspark package contains all of the code enabling dbt to work with Apache Spark in Microsoft Fabric. This adapter connects to Fabric Lakehouses via Livy endpoints and supports both schema-enabled and non-schema Lakehouse configurations.

Key Features

  • Livy session management with session reuse and robust connectivity across dbt runs
  • Lakehouse with schema support — auto-detects schema-enabled lakehouses and uses three-part naming (lakehouse.schema.table)
  • Lakehouse without schema — standard two-part naming (lakehouse.table)
  • Materializations: table, view, incremental (append, merge, insert_overwrite), seed, snapshot
  • Fabric Environment support via environmentId configuration
  • Security: credential masking, UUID validation, HTTPS + domain validation, thread-safe token refresh
  • Resilience: HTTP 5xx retry with exponential backoff, bounded polling with configurable timeouts

Getting started

> To contribute to this adapter codebase, see [CONTRIBUTING.md](CONTRIBUTING.md).

Installation

pip install dbt-fabricspark

For local development using Azure CLI authentication (authentication: CLI), install with the cli extra:

pip install dbt-fabricspark[cli]

> Note: The azure-cli is an optional dependency is only required for the CLI authentication mode. Service Principal (SPN) and Fabric Notebook (fabric_notebook) authentication modes do not need it.

Issues, bug-bashing, help us help you

> ⚠️ Here's how you can get your issue triaged and fixed ASAP

In the age of AI, we should be innovating and shipping high-quality software daily.

So in this adapter, we try to fix bugs and ship features fast - and keep an extremely high bar for test coverage in CI before PRs merge to main.

Once you open an issue, please - try to be as descriptive as possible to give our human/AI maintainers the necessary details to reproduce the issue rapidly.

For example - if you can - create a dummy repro dbt project in your GitHub account - so the maintainers can reproduce your problem ASAP - see an example a well-written GitHub issue here.

If the issue is complex - once we have a fix identified, to gain more confidence, we might ask you install the adapter right from a PR branch to ensure the repro is gone in your setup as well, like so:

pip install git+https://github.com/microsoft/dbt-fabricspark.git@dev/somebranch/123

Configuration

Use a Livy endpoint to connect to Apache Spark in Microsoft Fabric. Configure your profiles.yml to connect via Livy endpoints.

Connection Modes

The adapter supports two connection modes via the livy_mode setting:

  • Local mode (livy_mode: local) — Connects to a self-hosted Spark instance running in a Docker container (contributed by @mdrakiburrahman). This mode supports the reuse_session flag and does not require Fabric compute, making it ideal for offline development and testing.
  • Fabric mode (livy_mode: fabric, default) — Connects to Apache Spark in Microsoft Fabric via the Fabric Livy API. For development workflows, enable reuse_session: true to persist the Livy session ID to a local file (configured via session_id_file, defaults to ./livy-session-id.txt). On subsequent dbt runs, the adapter reuses the existing session from the persisted file instead of creating a new one. If the file does not exist or the session has expired, a new session is created automatically.

Lakehouse without Schema

For standard Lakehouses (schema not enabled), use two-part naming. The schema field is set to the lakehouse name:

fabric-spark-test:
target: fabricspark-dev
outputs:
fabricspark-dev:
# Connection
type: fabricspark
method: livy
endpoint: https://api.fabric.microsoft.com/v1
workspaceid:
lakehouseid:
lakehouse: my_lakehouse
schema: my_lakehouse
threads: 1

# Authentication (CLI for local dev, SPN for CI/CD)
authentication: CLI
# client_id: # Required for SPN
# tenant_id: # Required for SPN
# client_secret: # Required for SPN

# Fabric Environment (optional)
# environmentId:

# Session management
reuse_session: true
# session_idle_timeout: "30m" # Opt-in only. Setting this triggers
# Fabric to bypass starter pools and
# cold-start an on-demand cluster.
# session_id_file: ./livy-session-id.txt # Default path

# Timeouts
connect_retries: 1
connect_timeout: 10
http_timeout: 120 # Seconds per HTTP request
session_start_timeout: 600 # Max wait for session start (10 min)
statement_timeout: 3600 # Max wait for statement result (1 hour)
poll_wait: 10 # Seconds between session start polls
poll_statement_wait: 5 # Seconds between statement result polls

# Retry & Shortcuts
retry_all: true
# create_shortcuts: false
# shortcuts_json_str: ''

# Spark configuration (optional)
# spark_config:
# name: "my-spark-session"
# spark.executor.memory: "4g"

In this mode:

  • Tables are referenced as lakehouse.table_name
  • The schema field should match the lakehouse name
  • All objects are created directly under the lakehouse

Lakehouse with Schema (Schema-Enabled)

For schema-enabled Lakehouses, you can organize tables into schemas within the lakehouse. The adapter auto-detects whether a lakehouse has schemas enabled via the Fabric REST API (properties.defaultSchema):

fabric-spark-test:
target: fabricspark-dev
outputs:
fabricspark-dev:
type: fabricspark
method: livy...

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

Solid new dbt adapter repo with moderate GitHub traction.