RepoDatabricks (DBRX)Databricks (DBRX)published Jan 18, 2024seen 5d

databricks/databricks-sqlalchemy

Python

Open original ↗

Captured source

source ↗

databricks/databricks-sqlalchemy

Description: See PECO-1396 for more details about this repository.

Language: Python

License: Apache-2.0

Stars: 24

Forks: 16

Open issues: 23

Created: 2024-01-18T21:55:41Z

Pushed: 2026-06-03T20:33:46Z

Default branch: main

Fork: no

Archived: no

README:

Databricks dialect for SQLALchemy 2.0

The Databricks dialect for SQLAlchemy serves as bridge between SQLAlchemy and the Databricks SQL Python driver. A working example demonstrating usage can be found in sqlalchemy_example.py.

Installation

To install the dialect and its dependencies:

pip install databricks-sqlalchemy

If you also plan to use alembic you can alternatively run:

pip install alembic

Connection String

Every SQLAlchemy application that connects to a database needs to use an Engine, which you can create by passing a connection string to create_engine. The connection string must include these components:

1. Host 2. HTTP Path for a compute resource 3. API access token 4. Initial catalog for the connection 5. Initial schema for the connection

Note: Our dialect is built and tested on workspaces with Unity Catalog enabled. Support for the `hive_metastore` catalog is untested.

For example:

import os
from sqlalchemy import create_engine

host = os.getenv("DATABRICKS_SERVER_HOSTNAME")
http_path = os.getenv("DATABRICKS_HTTP_PATH")
access_token = os.getenv("DATABRICKS_TOKEN")
catalog = os.getenv("DATABRICKS_CATALOG")
schema = os.getenv("DATABRICKS_SCHEMA")

engine = create_engine(
f"databricks://token:{access_token}@{host}?http_path={http_path}&catalog={catalog}&schema={schema}"
)

Connection URL parameters and connect_args

The Databricks SQLAlchemy dialect accepts dialect-specific options in the SQLAlchemy connection URL query string:

| Parameter | Required | Default | Description | |-|-|-|-| | http_path | Yes | | HTTP path for the Databricks SQL warehouse or compute resource. | | catalog | Yes | | Initial catalog for the connection. | | schema | Yes | | Initial schema for the connection. | | enable_multirow_insert_casts | No | true | Enables targeted casts for mixed scalar values in SQLAlchemy-generated multi-row INSERT ... VALUES statements. This avoids Spark inline-table type errors for pandas to_sql(method="multi") with mixed scalar/object columns. Set to false to disable this rewrite. |

For example, to disable targeted multi-row insert casts:

engine = create_engine(
"databricks://token:dapi***@***.cloud.databricks.com"
"?http_path=***&catalog=main&schema=test"
"&enable_multirow_insert_casts=false"
)

Use SQLAlchemy's connect_args for DBAPI connection options that should be passed through to databricks-sql-connector, such as user-agent settings:

engine = create_engine(
"databricks://token:dapi***@***.cloud.databricks.com"
"?http_path=***&catalog=main&schema=test",
connect_args={"user_agent_entry": "My SQLAlchemy App"},
)

Dialect URL parameters control SQLAlchemy compilation behavior and are not forwarded to the DBAPI connector.

Types

The SQLAlchemy type hierarchy contains backend-agnostic type implementations (represented in CamelCase) and backend-specific types (represented in UPPERCASE). The majority of SQLAlchemy's CamelCase types are supported. This means that a SQLAlchemy application using these types should "just work" with Databricks.

|SQLAlchemy Type|Databricks SQL Type| |-|-| `BigInteger`| `BIGINT` `LargeBinary`| (not supported)| `Boolean`| `BOOLEAN` `Date`| `DATE` `DateTime`| `TIMESTAMP_NTZ`| `Double`| `DOUBLE` `Enum`| (not supported)| `Float`| `FLOAT` `Integer`| `INT` `Numeric`| `DECIMAL`| `PickleType`| (not supported)| `SmallInteger`| `SMALLINT` `String`| `STRING`| `Text`| `STRING`| `Time`| `STRING`| `Unicode`| `STRING`|…

Excerpt shown — open the source for the full document.