Snowflake-Labs/sfguide-getting-started-openflow-kafka-connector
Jupyter Notebook
Captured source
source ↗Snowflake-Labs/sfguide-getting-started-openflow-kafka-connector
Description: Companion repo for Quickstart Getting Started with Snowflake Openflow Kafka Connector
Language: Jupyter Notebook
License: NOASSERTION
Stars: 1
Forks: 1
Open issues: 1
Created: 2025-10-16T04:46:28Z
Pushed: 2025-10-17T14:05:59Z
Default branch: main
Fork: no
Archived: no
README:
Getting Started with Snowflake Openflow Kafka Connector
Companion repository for the Snowflake Openflow Kafka Connector Quickstart.
Overview
Build a real-time streaming pipeline from Apache Kafka to Snowflake using Openflow. Stream application logs from Kafka to Snowflake with automatic schema detection, schema evolution, and perform powerful SQL analytics including semantic search with Cortex Search.
📖 [Follow the Complete Quickstart Guide](https://quickstarts.snowflake.com/guide/getting_started_with_openflow_kafka_connector/index.html)
Repository Contents
. ├── README.md # This file ├── RPK_CLI_README.md # Detailed RPK CLI setup guide ├── Taskfile.yml # Task runner for common operations ├── LICENSE.txt # Apache 2.0 license ├── pyproject.toml # Python project dependencies ├── .env.template # Environment variable template ├── sql/ │ ├── 1.snowflake_setup.sql # Snowflake environment setup │ ├── 2.verify_ingestion.sql # Data ingestion verification │ ├── 2a.verify_base_schema.sql # Verify base schema ingestion │ ├── 2b.verify_schema_evolution.sql # Verify schema evolution │ ├── 3.analytics_queries.sql # Example analytics queries │ ├── 4.cortex_search.sql # Semantic search with Cortex Search │ └── 5.cleanup.sql # Cleanup script └── sample-data/ ├── sample_logs.json # 50 base schema sample records ├── sample_logs_evolved.json # 80 evolved schema sample records └── generate_logs.py # Python log generator script
Quick Start
1. Clone and Setup
git clone https://github.com/Snowflake-Labs/sfguide-getting-started-openflow-kafka-connector.git cd sfguide-getting-started-openflow-kafka-connector export QUICK_START_REPO=$PWD
2. Install Task Runner (Optional)
# macOS brew install go-task # Linux sh -c "$(curl --location https://taskfile.dev/install.sh)" -- -d -b ~/.local/bin # See all available tasks task list
3. Follow the Quickstart Guide
👉 [Complete Step-by-Step Instructions](https://quickstarts.snowflake.com/guide/getting_started_with_openflow_kafka_connector/index.html)
The quickstart guide walks you through:
- Setting up Kafka and rpk CLI
- Configuring Snowflake (database, network rules, External Access Integration)
- Deploying Openflow SPCS runtime
- Configuring the Kafka connector with automatic schema evolution
- Generating and streaming logs
- Running analytics and Cortex Search queries
Common Tasks
Generate Sample Files (No Kafka Required)
task generate-samples # Generate both base and evolved sample files task generate-base # Generate 50 base schema records task generate-evolved # Generate 80 evolved schema records
Test Kafka Connection
# Setup environment cp .env.template .env # Edit with your Kafka credentials task test-kafka # Verify connection
Produce Logs to Kafka
# Using Python generator task produce-base # Produce 50 base schema logs task produce-evolved # Produce 80 evolved schema logs # Using rpk (faster) task rpk-produce-base task rpk-produce-evolved # Complete demo workflow task demo-full # Run Phase 1 + Phase 2
Kafka Topic Management
task kafka-create-topic # Create application-logs topic task kafka-topics # List all topics task kafka-cluster-info # Show cluster information
Prerequisites
- Snowflake Account: Enterprise account with Openflow SPCS enabled
- Kafka Cluster: Access to Kafka (Confluent Cloud, AWS MSK, GCP Managed Kafka, or self-hosted)
- Redpanda CLI (rpk): For Kafka operations (install guide)
- 📖 See [RPK_CLI_README.md](RPK_CLI_README.md) for detailed setup with Confluent Cloud
- Python 3.7+: Optional, for the log generator script
- Task Runner: Optional, but recommended for simplified command execution
Sample Data
`sample-data/sample_logs.json` (50 records):
- Base schema with 11 fields (timestamp, level, service, host, message, etc.)
- Web API requests, authentication, database operations, payments
`sample-data/sample_logs_evolved.json` (80 records):
- Evolved schema with 29 fields
- Includes additional fields: region, trace_id, auth_method, currency, payment_method, and more
- Contains system metrics (memory_percent, disk_usage_percent) for Cortex Search examples
`sample-data/generate_logs.py`:
- Generate custom log files:
python generate_logs.py --count 100 --output my_logs.json - Test Kafka connection:
python generate_logs.py --test-connection - Produce to Kafka:
python generate_logs.py --count 100 - See
--helpfor all options
SQL Scripts
All SQL scripts are referenced in the quickstart guide:
- `1.snowflake_setup.sql` - Create role, database, warehouse, network rules, External Access Integration
- `2.verify_ingestion.sql` - Verify data ingestion and check record counts
- `2a.verify_base_schema.sql` - Validate base schema (11 fields)
- `2b.verify_schema_evolution.sql` - Verify evolved schema (29 fields) and automatic column detection
- `3.analytics_queries.sql` - Advanced analytics examples (log analysis, time-series, performance metrics)
- `4.cortex_search.sql` - Semantic search queries with Cortex Search
- `5.cleanup.sql` - Clean up all Snowflake resources
Resources
Quickstart & Documentation:
- Openflow Kafka Connector Quickstart ⭐
- Openflow Documentation
- Kafka Connector Documentation
- [rpk CLI Setup Guide](RPK_CLI_README.md)
Related Quickstarts:
- Getting Started with Openflow SPCS
- [Getting Started with PostgreSQL…
Excerpt shown — open the source for the full document.
Notability
notability 1.0/10Routine repo with minimal traction