RepoSnowflake (Arctic)Snowflake (Arctic)published Apr 14, 2026seen 5d

Snowflake-Labs/sfguide-lakehouse-iceberg-production-pipelines

Python

Open original ↗

Captured source

source ↗

Snowflake-Labs/sfguide-lakehouse-iceberg-production-pipelines

Description: The companion repo for lakehouse-iceberg-production-pipelines quickstart

Language: Python

License: Apache-2.0

Stars: 1

Forks: 1

Open issues: 0

Created: 2026-04-14T12:07:46Z

Pushed: 2026-05-27T05:01:04Z

Default branch: main

Fork: no

Archived: no

README:

Lakehouse Transformations: Build Production Pipelines for your Iceberg Tables

Stop pipeline sprawl and the cost of data duplication. This lab shows how to perform secure, in-place transformations across your entire data estate: connect externally managed Iceberg tables with Catalog Linked Databases to always work on fresh data without ETL, build efficient and declarative pipelines with Dynamic Tables for Iceberg preserving multi-engine access to your data, and implement business continuity to ensure your production data is always available.

> The companion Snowflake Quickstart walks through the same steps in a guided format. Link will be added on publish.

Architecture

---
config:
theme: mc
layout: elk
---
flowchart TB
subgraph Generation["Data Generation"]
PyGen["Streaming
Balloon Pop Events"]
end
subgraph AWS["AWS"]
GlueCat["Glue Data Catalog balloon_game_events table"]
S3["S3 Warehouse
s3://balloon_pops/iceberg/"]
LF["Lake Formation"]
end
subgraph Snowflake["Snowflake"]
CI["Catalog Integration
Glue Iceberg REST + SigV4"]
CLD["Catalog Linked Database(CLD)
balloon_game_events"]
DTs["Dynamic Iceberg Tables
silver pipelines"]
ExtVol["Snowflake Storage
(PuPr)"]
SiS["Streamlit in Snowflake"]
HIRC["Horizon Iceberg REST Catalog
(PuPr)"]
end
PyGen -- PyIceberg write --> GlueCat
GlueCat --> S3
CI --> CLD
CLD --> DTs
DTs -- writes Iceberg --> ExtVol
DTs --> SiS
DTs -.-> HIRC
HIRC DuckDB["DuckDB
Cross-Engine Access"]
LF -- vended credentials --> CI
LF --> S3

Lab Layers

| Layer | Technology | What it does | |-------|------------|--------------| | Bronze | Glue + S3 + PyIceberg | Loads raw game events as Iceberg in AWS | | Catalog | Snowflake Catalog Integration | Connects Snowflake to Glue Iceberg REST with SigV4 + LF vended credentials | | CLD | Catalog-Linked Database | Mirrors Glue namespaces and tables as Snowflake schemas — no data copy | | Silver | Dynamic Iceberg Tables | Transforms JSON bronze into 5 aggregation tables; writes Iceberg back to S3 | | Dashboard | Streamlit in Snowflake | Live dashboard over silver DTs; zero local server | | Cross-engine | DuckDB via HIRC | Queries silver Iceberg tables through Snowflake's Horizon REST Catalog |

---

Prerequisites

Clone the Repository

git clone https://github.com/Snowflake-Labs/sfguide-lakehouse-iceberg-production-pipelines.git
cd sfguide-lakehouse-iceberg-production-pipelines

Accounts and Permissions

  • AWS account with a named profile (AWS_PROFILE) that can create and update Glue databases, manage IAM roles, and access S3
  • Snowflake account with ACCOUNTADMIN or a role with CREATE INTEGRATION, CREATE DATABASE, and CREATE STREAMLIT privileges
  • Snowflake CLI connection configured for that account — snow connection list and snow connection test both succeed

Required Tools

This repo targets Python 3.12+. uv manages the interpreter and all dependencies.

| Tool | Role | macOS | Linux (Debian/Ubuntu) | Windows | |------|------|-------|-----------------------|---------| | Git | Clone the repository | brew install git | sudo apt install git | Git for Windows | | uv | Python deps and uv run entrypoints | brew install uv | Astral installer | PowerShell installer | | Task | task bronze:*, task check-tools | brew install go-task | Install script | scoop install task | | AWS CLI v2 | Glue, S3, STS; S3 Tables needs v2.34+ | brew install awscli | AWS bundled installer | AWS MSI | | Snowflake CLI | Snowflake steps; also available via uv sync | Snowflake CLI docs | Snowflake CLI docs | Snowflake CLI docs | | envsubst | Renders IAM policy templates (gettext package) | brew install gettext | sudo apt install gettext-base | WSL2 recommended | | jq | JSON checks at the shell | brew install jq | sudo apt install jq | scoop install jq |

Windows note: If task check-tools fails only on envsubst, use WSL2 or run uv run bronze-cli render-iam instead.

Recommended Tools

| Tool | Why | macOS | Linux | Windows | |------|-----|-------|-------|---------| | direnv | Auto-loads .env when you cd into the repo | brew install direnv | sudo apt install direnv | WSL2 | | curl | Scripts and health checks | pre-installed | pre-installed | curl.se | | openssl | TLS and crypto one-liners | pre-installed | pre-installed | OpenSSL binaries |

Verify Installation

Sync Python dependencies:

uv sync

Set your AWS profile and run the prerequisite check:

export AWS_PROFILE=your-profile
task check-tools

task check-tools runs tools/check_lab_prereqs.py: it verifies required binaries on PATH and runs aws sts get-caller-identity. Fix any missing entries and refresh credentials if STS fails, then re-run until you see All required tools are available.

---

Environment Setup

Copy .env.example to .env and fill in your values. Never commit .env.

cp .env.example .env

Key variables by phase:

| Variable | Phase | Default | Notes | |----------|-------|---------|-------| | AWS_PROFILE | 1 | required | AWS named profile for all bronze tasks | | AWS_REGION | 1 | required | Keeps all API calls in one region | | LAB_USERNAME | 1 | none | Workshop shared accounts — drives bucket/database name derivation | | BRONZE_BUCKET_NAME | 1 | derived | S3 warehouse bucket; iceberg/ becomes the Glue warehouse URI | | BRONZE_S3TABLES_BUCKET_NAME |…

Excerpt shown — open the source for the full document.

Notability

notability 1.0/10

Routine guide repo, minimal traction.