databricks/databricks-sqlalchemy
Python
Captured source
source ↗databricks/databricks-sqlalchemy
Description: See PECO-1396 for more details about this repository.
Language: Python
License: Apache-2.0
Stars: 24
Forks: 16
Open issues: 23
Created: 2024-01-18T21:55:41Z
Pushed: 2026-06-03T20:33:46Z
Default branch: main
Fork: no
Archived: no
README:
Databricks dialect for SQLALchemy 2.0
The Databricks dialect for SQLAlchemy serves as bridge between SQLAlchemy and the Databricks SQL Python driver. A working example demonstrating usage can be found in sqlalchemy_example.py.
Installation
To install the dialect and its dependencies:
pip install databricks-sqlalchemy
If you also plan to use alembic you can alternatively run:
pip install alembic
Connection String
Every SQLAlchemy application that connects to a database needs to use an Engine, which you can create by passing a connection string to create_engine. The connection string must include these components:
1. Host 2. HTTP Path for a compute resource 3. API access token 4. Initial catalog for the connection 5. Initial schema for the connection
Note: Our dialect is built and tested on workspaces with Unity Catalog enabled. Support for the `hive_metastore` catalog is untested.
For example:
import os
from sqlalchemy import create_engine
host = os.getenv("DATABRICKS_SERVER_HOSTNAME")
http_path = os.getenv("DATABRICKS_HTTP_PATH")
access_token = os.getenv("DATABRICKS_TOKEN")
catalog = os.getenv("DATABRICKS_CATALOG")
schema = os.getenv("DATABRICKS_SCHEMA")
engine = create_engine(
f"databricks://token:{access_token}@{host}?http_path={http_path}&catalog={catalog}&schema={schema}"
)Connection URL parameters and connect_args
The Databricks SQLAlchemy dialect accepts dialect-specific options in the SQLAlchemy connection URL query string:
| Parameter | Required | Default | Description | |-|-|-|-| | http_path | Yes | | HTTP path for the Databricks SQL warehouse or compute resource. | | catalog | Yes | | Initial catalog for the connection. | | schema | Yes | | Initial schema for the connection. | | enable_multirow_insert_casts | No | true | Enables targeted casts for mixed scalar values in SQLAlchemy-generated multi-row INSERT ... VALUES statements. This avoids Spark inline-table type errors for pandas to_sql(method="multi") with mixed scalar/object columns. Set to false to disable this rewrite. |
For example, to disable targeted multi-row insert casts:
engine = create_engine( "databricks://token:dapi***@***.cloud.databricks.com" "?http_path=***&catalog=main&schema=test" "&enable_multirow_insert_casts=false" )
Use SQLAlchemy's connect_args for DBAPI connection options that should be passed through to databricks-sql-connector, such as user-agent settings:
engine = create_engine(
"databricks://token:dapi***@***.cloud.databricks.com"
"?http_path=***&catalog=main&schema=test",
connect_args={"user_agent_entry": "My SQLAlchemy App"},
)Dialect URL parameters control SQLAlchemy compilation behavior and are not forwarded to the DBAPI connector.
Types
The SQLAlchemy type hierarchy contains backend-agnostic type implementations (represented in CamelCase) and backend-specific types (represented in UPPERCASE). The majority of SQLAlchemy's CamelCase types are supported. This means that a SQLAlchemy application using these types should "just work" with Databricks.
|SQLAlchemy Type|Databricks SQL Type| |-|-| `BigInteger`| `BIGINT` `LargeBinary`| (not supported)| `Boolean`| `BOOLEAN` `Date`| `DATE` `DateTime`| `TIMESTAMP_NTZ`| `Double`| `DOUBLE` `Enum`| (not supported)| `Float`| `FLOAT` `Integer`| `INT` `Numeric`| `DECIMAL`| `PickleType`| (not supported)| `SmallInteger`| `SMALLINT` `String`| `STRING`| `Text`| `STRING`| `Time`| `STRING`| `Unicode`| `STRING`|…
Excerpt shown — open the source for the full document.