RepoDatabricks (DBRX)Databricks (DBRX)published Mar 21, 2022seen 5d

databricks/run-notebook

TypeScript

Open original ↗

Captured source

source ↗
published Mar 21, 2022seen 5dcaptured 13hhttp 200method plain

databricks/run-notebook

Language: TypeScript

License: Apache-2.0

Stars: 62

Forks: 20

Open issues: 18

Created: 2022-03-21T14:52:33Z

Pushed: 2024-04-15T14:19:27Z

Default branch: main

Fork: no

Archived: no

README:

databricks/run-notebook v0

Overview

Given a Databricks notebook and cluster specification, this Action runs the notebook as a one-time Databricks Job run (docs: AWS | Azure | GCP) and awaits its completion:

  • optionally installing libraries on the cluster before running the notebook
  • optionally configuring permissions on the notebook run (e.g. granting other users permission to view results)
  • optionally triggering the Databricks job run with a timeout
  • optionally using a Databricks job run name
  • setting the notebook output,

job run ID, and job run page URL as Action output

  • failing if the Databricks job run fails

You can use this Action to trigger code execution on Databricks for CI (e.g. on pull requests) or CD (e.g. on pushes to master).

Prerequisites

To use this Action, you need a Databricks REST API token to trigger notebook execution and await completion. The API token must be associated with a principal with the following permissions:

  • Cluster permissions (AWS |

Azure | GCP): Allow unrestricted cluster creation entitlement, if running the notebook against a new cluster (recommended), or "Can restart" permission, if running the notebook against an existing cluster.

  • Workspace permissions (AWS |

Azure | GCP):

  • If supplying local-notebook-path with one of the git-commit, git-tag, or git-branch parameters, no workspace

permissions are required. However, your principal must have Git integration configured (AWS | Azure | GCP). You can associate git credentials with your principal by creating a git credential entry using your principal's API token.

  • If supplying the local-notebook-path parameter, "Can manage" permissions on the directory specified by the

workspace-temp-dir parameter (the /tmp/databricks-github-actions directory if workspace-temp-dir is unspecified).

  • If supplying the workspace-notebook-path parameter, "Can read" permissions on the specified notebook.

We recommend that you store the Databricks REST API token in GitHub Actions secrets to pass it into your GitHub Workflow. The following section lists recommended approaches for token creation by cloud.

Note: we recommend that you do not run this Action against workspaces with IP restrictions. GitHub-hosted action runners have a wide range of IP addresses, making it difficult to whitelist.

AWS

For security reasons, we recommend creating and using a Databricks service principal API token. You can create a service principal, grant the Service Principal token usage permissions, and generate an API token on its behalf.

Azure

For security reasons, we recommend using a Databricks service principal AAD token.

Create an Azure Service Principal

Here are two ways that you can create an Azure Service Principal.

The first way is via the Azure Portal UI. See the Azure Databricks documentation. Record the Application (client) Id, Directory (tenant) Id, and client secret values generated by the steps.

The second way is via the Azure CLI. You can follow the instructions below:

  • Install the Azure CLI
  • Run az login to authenticate with Azure
  • Run az ad sp create-for-rbac -n --sdk-auth --scopes /subscriptions//resourceGroups/ --sdk-auth --role contributor,

specifying the subscription and resource group of your Azure Databricks workspace, to create a service principal and client secret.

From the resulting JSON output, record the following values:

  • clientId: this is the client or application Id of your service principal.
  • clientSecret: this is the client service of your service princiapl.
  • tenantId: this is the tenant or directory Id of your service principal.

After you create an Azure Service Principal, you should add it to your Azure Databricks workspace using the SCIM API. Use the client or application Id of your service principal as the applicationId of the service principal in the add-service-principal payload.

Use the Service Principal in your GitHub Workflow

  • Store your service principal credentials into your GitHub repository secrets. The Application (client) Id should be stored as AZURE_SP_APPLICATION_ID, Directory (tenant) Id as AZURE_SP_TENANT_ID, and client secret as AZURE_SP_CLIENT_SECRET.
  • Add the following step at the start of your GitHub workflow.…

Excerpt shown — open the source for the full document.