What does this repo signal mean?

Snowflake (Arctic) published Snowflake-Labs/apache-iceberg-from-zero (Jupyter Notebook). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo Snowflake-Labs/apache-iceberg-from-zero · language Jupyter Notebook · Low traction new repo from Snowflake. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

Snowflake (Arctic) Repo: Snowflake-Labs/apache-iceberg-from-zero

Captured source

source ↗

GitHub/github.com/Snowflake-Labs/apache-iceberg-from-zero

Snowflake-Labs/apache-iceberg-from-zero repository metadata

Source ↗

published Mar 3, 2026seen Jun 5captured Jun 11http 200method plain

Snowflake-Labs/apache-iceberg-from-zero

Language: Jupyter Notebook

License: Apache-2.0

Stars: 7

Forks: 11

Open issues: 0

Created: 2026-03-03T18:25:28Z

Pushed: 2026-03-27T21:13:10Z

Default branch: main

Fork: no

Archived: no

README:

Apache Iceberg Course - Docker Setup

This Docker setup provides a complete, production-like environment for learning Apache Iceberg with:

MinIO: A Local S3-compatible object storage for table data
Polaris: Apache Iceberg REST Catalog
Jupyter Notebook: Interactive Python notebook with PySpark and Iceberg support
Trino: Distributed SQL query engine

You should have found this repositories along with the course videos here (TODO LINK), please check them out if you haven't.

Version Configuration

All versions are centrally managed in the .env file:

Current pinned versions:

Iceberg: 1.10.0 (released September 5, 2025)
Spark: 4.0.1 with Scala 2.13 (September 2, 2025)
Polaris: latest
Trino: 465

To update versions, simply edit the .env file and rebuild:

docker compose up -d --build

Prerequisites

Docker Desktop installed and running
At least 8GB of RAM allocated to Docker
At least 10GB of free disk space

Quick Start

1. Start all services:

docker compose up -d

2. Wait for services to be ready (approximately 1-2 minutes):

docker compose logs -f

Press Ctrl+C to stop following logs once services are running.

3. Access the services:

Jupyter Notebook: http://localhost:8888 (no password) - Start here!
MinIO Console: http://localhost:9001 (admin/password) - View your data
Trino UI: http://localhost:8080 (username: admin, no password)
Polaris API: http://localhost:8181

4. Open the demo notebook:

Direct link: http://localhost:8888/lab/tree/work/E1.1%20-%20OpenLakehouse.ipynb
Run through the cells to see Iceberg with Polaris and MinIO

Service Details

MinIO (S3-Compatible Storage)

MinIO provides S3-compatible object storage for Iceberg table data.

Configuration:

API Port: 9000
Console Port: 9001
Username: admin
Password: password
Bucket: warehouse
Data directory: ./data/minio

Access the Console:

URL: http://localhost:9001
Login with admin/password
Browse the warehouse bucket to see your Iceberg table files

Polaris Iceberg REST Catalog

The Polaris catalog provides a REST API for managing Iceberg table metadata. It's configured with in-memory persistence for Catalog entries and MinIO for table metadata.

Configuration:

Port: 8181
Data directory: ./data/polaris
Storage: MinIO S3 (s3://warehouse/)
OAuth2 credentials automatically generated on first start

Initialization: The Polaris catalog is automatically initialized with:

S3 storage configuration pointing to MinIO
OAuth2 credentials (root:s3cr3t defined in .env)

The polaris-setup service runs bootstrap-catalog.sh on startup to configure the catalog.

Trino

Trino is configured with an Iceberg connector that connects to the Polaris catalog.

Configuration:

Port: 8080
Catalog: iceberg (connected to Polaris)
Data directory: ./data/trino
Config files: ./trino/config/

Connect to Trino CLI:

docker exec -it trino trino

Example Trino queries:

-- Show catalogs
SHOW CATALOGS;

-- Create a namespace
CREATE SCHEMA iceberg.demo;

-- Show schemas
SHOW SCHEMAS IN iceberg;

-- Create a table
CREATE TABLE iceberg.demo.test (
id BIGINT,
name VARCHAR
) WITH (format = 'PARQUET');

-- Insert data
INSERT INTO iceberg.demo.test VALUES (1, 'Alice'), (2, 'Bob');

-- Query data
SELECT * FROM iceberg.demo.test;

Jupyter Notebook with PySpark

The Jupyter environment comes pre-configured with:

PySpark with Iceberg support
PyIceberg library
Trino Python client
Pandas, Matplotlib, Seaborn

Access:

URL: http://localhost:8888
Notebooks directory: ./notebooks
Data directory: ./data/jupyter

Catalog Configuration:

The demo notebook uses the Polaris REST catalog with MinIO S3 storage
Metadata managed by Polaris (centralized, REST API)
Table data stored in MinIO (s3://warehouse/)
This is a production-like pattern - same architecture as using Polaris with real S3/Azure/GCS
OAuth2 authentication configured automatically

Sample notebook: E1.1 - OpenLakehouse.ipynb is provided with examples of:

Creating Iceberg tables
Querying data
ACID transactions
Time travel
Schema evolution
Partitioning

Data Persistence

All data is stored locally in the ./data directory:

./data/minio: MinIO object storage (Iceberg table data)
./data/polaris: Polaris catalog metadata
./data/trino: Trino working data
./data/jupyter: Jupyter user data

This ensures that your data persists even when containers are stopped.

Common Commands

Start all services:

docker compose up -d

Stop all services:

docker compose down

View logs:

# All services
docker compose logs -f

# Specific service
docker compose logs -f jupyter
docker compose logs -f trino
docker compose logs -f polaris

Restart a service:

docker compose restart jupyter

Rebuild and restart (after config changes):

docker compose up -d --build

Troubleshooting

Services not starting

Check if ports are already in use:

# macOS/Linux
lsof -i :8888 # Jupyter
lsof -i :8080 # Trino
lsof -i :9000 # MinIO API
lsof -i :9001 # MinIO Console
lsof -i :8181 # Polaris

Check service health

# Check if containers are running
docker compose ps

# Check specific service logs
docker compose logs jupyter

Connection refused errors

Make sure all services are fully started (check logs)
Services may take 1-2 minutes to initialize
Verify network connectivity: docker network ls

Spark Session: `ConnectionRefusedError: [Errno 111] Connection refused`

If you see this error when initializing a Spark Session in a notebook, the Spark Connect server may have failed to start. Check the Docker container logs (docker logs jupyter-spark) for details. Common causes include insufficient Docker RAM or port conflicts. You can also try restarting the container (docker compose restart jupyter).

SSL / Corporate Proxy Errors Downloading JARs

If you are on a corporate network with a proxy or firewall that...

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Low traction new repo from Snowflake

Snowflake-Labs/apache-iceberg-from-zero

Apache Iceberg Course - Docker Setup

Version Configuration

Prerequisites

Quick Start

Service Details

MinIO (S3-Compatible Storage)

Polaris Iceberg REST Catalog

Trino

Jupyter Notebook with PySpark

Data Persistence

Common Commands

Troubleshooting

Services not starting

Check service health

Connection refused errors

Spark Session: ConnectionRefusedError: [Errno 111] Connection refused

SSL / Corporate Proxy Errors Downloading JARs

Spark Session: `ConnectionRefusedError: [Errno 111] Connection refused`