microsoft/dbt-fabricspark
Python
Captured source
source ↗microsoft/dbt-fabricspark
Description: A dbt adapter for Fabric Lakehouse backed by Apache Spark.
Language: Python
License: MIT
Stars: 62
Forks: 35
Open issues: 0
Created: 2024-02-28T22:49:23Z
Pushed: 2026-06-13T02:57:36Z
Default branch: main
Fork: no
Archived: no
README:
Fabric Spark - dbt
dbt adapter for Fabric Spark supporting SQL models.
dbt Docs · Fabric Lakehouse with Spark · Fabric Lakehouse Livy API
dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
dbt is the T in ELT. Organize, cleanse, denormalize, filter, rename, and pre-aggregate the raw data in your warehouse so that it's ready for analysis.
dbt-fabricspark
The dbt-fabricspark package contains all of the code enabling dbt to work with Apache Spark in Microsoft Fabric. This adapter connects to Fabric Lakehouses via Livy endpoints and supports both schema-enabled and non-schema Lakehouse configurations.
Key Features
- Livy session management with session reuse and robust connectivity across dbt runs
- Lakehouse with schema support — auto-detects schema-enabled lakehouses and uses three-part naming (
lakehouse.schema.table) - Lakehouse without schema — standard two-part naming (
lakehouse.table) - Materializations: table, view, incremental (append, merge, insert_overwrite), seed, snapshot
- Fabric Environment support via
environmentIdconfiguration - Security: credential masking, UUID validation, HTTPS + domain validation, thread-safe token refresh
- Resilience: HTTP 5xx retry with exponential backoff, bounded polling with configurable timeouts
Getting started
- Install dbt
- Read the introduction and viewpoint
> To contribute to this adapter codebase, see [CONTRIBUTING.md](CONTRIBUTING.md).
Installation
pip install dbt-fabricspark
For local development using Azure CLI authentication (authentication: CLI), install with the cli extra:
pip install dbt-fabricspark[cli]
> Note: The azure-cli is an optional dependency is only required for the CLI authentication mode. Service Principal (SPN) and Fabric Notebook (fabric_notebook) authentication modes do not need it.
Issues, bug-bashing, help us help you
> ⚠️ Here's how you can get your issue triaged and fixed ASAP
In the age of AI, we should be innovating and shipping high-quality software daily.
So in this adapter, we try to fix bugs and ship features fast - and keep an extremely high bar for test coverage in CI before PRs merge to main.
Once you open an issue, please - try to be as descriptive as possible to give our human/AI maintainers the necessary details to reproduce the issue rapidly.
For example - if you can - create a dummy repro dbt project in your GitHub account - so the maintainers can reproduce your problem ASAP - see an example a well-written GitHub issue here.
If the issue is complex - once we have a fix identified, to gain more confidence, we might ask you install the adapter right from a PR branch to ensure the repro is gone in your setup as well, like so:
pip install git+https://github.com/microsoft/dbt-fabricspark.git@dev/somebranch/123
Configuration
Use a Livy endpoint to connect to Apache Spark in Microsoft Fabric. Configure your profiles.yml to connect via Livy endpoints.
Connection Modes
The adapter supports two connection modes via the livy_mode setting:
- Local mode (
livy_mode: local) — Connects to a self-hosted Spark instance running in a Docker container (contributed by @mdrakiburrahman). This mode supports thereuse_sessionflag and does not require Fabric compute, making it ideal for offline development and testing.
- Fabric mode (
livy_mode: fabric, default) — Connects to Apache Spark in Microsoft Fabric via the Fabric Livy API. For development workflows, enablereuse_session: trueto persist the Livy session ID to a local file (configured viasession_id_file, defaults to./livy-session-id.txt). On subsequentdbtruns, the adapter reuses the existing session from the persisted file instead of creating a new one. If the file does not exist or the session has expired, a new session is created automatically.
Lakehouse without Schema
For standard Lakehouses (schema not enabled), use two-part naming. The schema field is set to the lakehouse name:
fabric-spark-test: target: fabricspark-dev outputs: fabricspark-dev: # Connection type: fabricspark method: livy endpoint: https://api.fabric.microsoft.com/v1 workspaceid: lakehouseid: lakehouse: my_lakehouse schema: my_lakehouse threads: 1 # Authentication (CLI for local dev, SPN for CI/CD) authentication: CLI # client_id: # Required for SPN # tenant_id: # Required for SPN # client_secret: # Required for SPN # Fabric Environment (optional) # environmentId: # Session management reuse_session: true # session_idle_timeout: "30m" # Opt-in only. Setting this triggers # Fabric to bypass starter pools and # cold-start an on-demand cluster. # session_id_file: ./livy-session-id.txt # Default path # Timeouts connect_retries: 1 connect_timeout: 10 http_timeout: 120 # Seconds per HTTP request session_start_timeout: 600 # Max wait for session start (10 min) statement_timeout: 3600 # Max wait for statement result (1 hour) poll_wait: 10 # Seconds between session start polls poll_statement_wait: 5 # Seconds between statement result polls # Retry & Shortcuts retry_all: true # create_shortcuts: false # shortcuts_json_str: '' # Spark configuration (optional) # spark_config: # name: "my-spark-session" # spark.executor.memory: "4g"
In this mode:
- Tables are referenced as
lakehouse.table_name - The
schemafield should match thelakehousename - All objects are created directly under the lakehouse
Lakehouse with Schema (Schema-Enabled)
For schema-enabled Lakehouses, you can organize tables into schemas within the lakehouse. The adapter auto-detects whether a lakehouse has schemas enabled via the Fabric REST API (properties.defaultSchema):
fabric-spark-test: target: fabricspark-dev outputs: fabricspark-dev: type: fabricspark method: livy...
Excerpt shown — open the source for the full document.
Notability
notability 5.0/10Solid new dbt adapter repo with moderate GitHub traction.