RepoSnowflake (Arctic)Snowflake (Arctic)published Feb 20, 2026seen 5d

Snowflake-Labs/sfguide-powering-AI-Ready-Data-Products-by-unpacking-complex-HL7-data-using-Snowflake

PLpgSQL

Open original ↗

Captured source

source ↗

Snowflake-Labs/sfguide-powering-AI-Ready-Data-Products-by-unpacking-complex-HL7-data-using-Snowflake

Language: PLpgSQL

License: Apache-2.0

Stars: 1

Forks: 0

Open issues: 0

Created: 2026-02-20T16:56:50Z

Pushed: 2026-03-19T20:07:07Z

Default branch: main

Fork: no

Archived: no

README:

Unified Healthcare Intelligence Platform

Transform healthcare data into actionable intelligence by combining Population Health Analytics with Cancer Pathology Intelligence in a single, unified platform powered by Snowflake Cortex AI.

Overview

This solution demonstrates a comprehensive healthcare analytics platform that:

  • Integrates Synthea-generated patient data with HL7 v2.7 cancer pathology messages
  • Extracts 20+ structured clinical fields from pathology reports using Claude 3.5 Sonnet
  • Enables natural language queries via a unified Snowflake Intelligence Agent
  • Provides semantic search across pathology reports, medical transcripts, and conditions
  • Visualizes population health metrics alongside cancer pathology insights

Key Capabilities

| Capability | Description | |------------|-------------| | Population Health | Patient demographics, encounters, observations, medications, conditions | | Cancer Pathology | TNM staging, molecular markers (MSI, KRAS, BRAF, HER2, ER, PR) | | AI Extraction | Claude 3.5 Sonnet extracts structured data from unstructured pathology text | | Unified Agent | Single agent with 4 tools for structured queries and semantic search | | Real-time Streaming | HL7 messages via OpenFlow with custom NiFi processor |

Architecture

┌─────────────────────────────────────────┐
│ DATA SOURCES │
├─────────────────────────────────────────┤
│ Synthea CSVs HL7 Messages PDFs │
└──────────┬──────────┬───────────┬───────┘
│ │ │
┌──────────▼──────────▼───────────▼───────┐
│ BRONZE LAYER │
│ Landing tables for raw data │
└──────────┬──────────┬───────────────────┘
│ │
┌──────────▼──────────▼───────────────────┐
│ SILVER LAYER │
│ Parsed HL7 segments, patient records │
└──────────┬──────────┬───────────────────┘
│ │
┌──────────▼──────────▼───────────────────┐
│ GOLD LAYER │
│ Patient 360, Cancer Reports, Analytics │
└──────────┬──────────────────────────────┘
│
┌─────────────────────────┼─────────────────────────┐
│ │ │
┌──────▼──────┐ ┌───────▼───────┐ ┌───────▼───────┐
│ Semantic │ │ Cortex Search │ │ Streamlit │
│ View │ │ Services │ │ Dashboard │
└──────┬──────┘ └───────┬───────┘ └───────────────┘
│ │
└────────────┬────────────┘
│
┌────────▼────────┐
│ HL7 PATHOLOGY │
│ INTELLIGENCE │
│ AGENT │
└─────────────────┘

Quick Start

Prerequisites

  • Snowflake account with ACCOUNTADMIN access
  • Cortex AI features enabled
  • Snowflake Intelligence available
  • (Optional) OpenFlow for HL7 streaming

Installation Steps

1. Clone or download this repository

2. Run infrastructure setup

-- Execute in order:
@sql/01_setup_infrastructure.sql
@sql/02_setup_pop_health_tables.sql
@sql/03_setup_pathology_tables.sql
@sql/04_setup_gold_analytics.sql
@sql/05_setup_semantic_view.sql
@sql/06_setup_cortex_search.sql
@sql/07_setup_intelligence_agent.sql

3. Generate HL7 pathology data

python scripts/generate_hl7_for_synthea.py

4. Load data and run pipeline

@sql/08_run_pipeline.sql

5. Deploy Streamlit dashboard

  • Upload streamlit/HL7_PATHOLOGY_DASHBOARD.py to Snowsight
  • Or run locally with Snowflake connection

Test the Agent

SELECT SNOWFLAKE.CORE.AGENT_QUERY(
'HL7_PATHOLOGY_AI.ANALYTICS.HL7_PATHOLOGY_AGENT',
'How many patients have cancer and what types?'
);

Project Structure

unified-healthcare-intelligence-demo/
├── README.md # This file
├── QUICKSTART.md # Detailed setup guide
│
├── sql/
│ ├── 01_setup_infrastructure.sql # Database, schemas, roles
│ ├── 02_setup_pop_health_tables.sql # Population health tables
│ ├── 03_setup_pathology_tables.sql # HL7 pathology pipeline
│ ├── 04_setup_gold_analytics.sql # Unified analytics views
│ ├── 05_setup_semantic_view.sql # Semantic view for Analyst
│ ├── 06_setup_cortex_search.sql # Search services
│ ├── 07_setup_intelligence_agent.sql # Unified agent
│ └── 08_run_pipeline.sql # Data loading & execution
│
├── scripts/
│ └── generate_hl7_for_synthea.py # Generate HL7 messages
│
├── data/
│ ├── csv_files/ # Synthea patient data
│ ├── hl7_messages/ # Generated HL7 JSON
│ ├── Medical_Transcripts/ # Clinical note PDFs
│ └── Pathology_Reports/ # Pathology report files
│
├── streamlit/
│ ├── HL7_PATHOLOGY_DASHBOARD.py
│ └── environment.yml
│
├── openflow/
│ ├── hl7_processors-0.2.0.nar # Custom HL7 parser NAR
│ └── Openflow-HL7-Flow.json # NiFi flow definition
│
└── semantic_models/
└── unified_healthcare.yaml

Intelligence Agent Tools

The unified agent includes 4 specialized tools:

| Tool | Type | Use Case | |------|------|----------| | query_healthcare_data | Cortex Analyst | Structured queries on patients, encounters, pathology | | search_pathology_reports | Cortex Search | Find specific diagnoses, staging, findings in reports | | search_medical_transcripts | Cortex Search | Search clinical notes and discharge summaries | | search_conditions | Cortex Search | Population-wide diagnosis searches |

Sample Questions

  • "How many patients have cancer and what types are represented?"
  • "Show me all Stage III colorectal cancer patients"
  • "Which patients have MSI-High status and are immunotherapy candidates?"
  • "Compare healthcare costs between cancer and non-cancer patients"
  • "Find pathology reports mentioning perineural invasion"
  • "What is the distribution of HER2 status in breast cancer patients?"

Data Model

Population Health Schema

  • PATIENTS - Demographics, location, healthcare costs
  • ENCOUNTERS - Visits, costs, reasons
  • OBSERVATIONS - Vitals, lab results
  • CONDITIONS - Diagnoses
  • MEDICATIONS - Prescriptions
  • ALLERGIES - Adverse reactions
  • IMMUNIZATIONS - Vaccinations

Cancer Pathology Schema

  • LANDING_HL7_MESSAGES - Raw HL7 JSON (Bronze)
  • SILVER_*_SEGMENTS - Parsed HL7 segments (Silver)
  • GOLD_CANCER_PATHOLOGY_REPORTS - AI-enriched reports (Gold)

Analytics Schema

  • PATIENT_360_VIEW - Unified patient profile
  • CANCER_POPULATION_SUMMARY - Population cancer stats
  • CANCER_MOLECULAR_PROFILE - Molecular marker analysis
  • HL7_PATHOLOGY_SEMANTIC_VIEW - Semantic model for Analyst

Technologies

| Component | Technology |…

Excerpt shown — open the source for the full document.

Notability

notability 1.0/10

Low stars, routine guide repo