Snowflake-Labs/sfguide-powering-AI-Ready-Data-Products-by-unpacking-complex-HL7-data-using-Snowflake
PLpgSQL
Captured source
source ↗Snowflake-Labs/sfguide-powering-AI-Ready-Data-Products-by-unpacking-complex-HL7-data-using-Snowflake
Language: PLpgSQL
License: Apache-2.0
Stars: 1
Forks: 0
Open issues: 0
Created: 2026-02-20T16:56:50Z
Pushed: 2026-03-19T20:07:07Z
Default branch: main
Fork: no
Archived: no
README:
Unified Healthcare Intelligence Platform
Transform healthcare data into actionable intelligence by combining Population Health Analytics with Cancer Pathology Intelligence in a single, unified platform powered by Snowflake Cortex AI.
Overview
This solution demonstrates a comprehensive healthcare analytics platform that:
- Integrates Synthea-generated patient data with HL7 v2.7 cancer pathology messages
- Extracts 20+ structured clinical fields from pathology reports using Claude 3.5 Sonnet
- Enables natural language queries via a unified Snowflake Intelligence Agent
- Provides semantic search across pathology reports, medical transcripts, and conditions
- Visualizes population health metrics alongside cancer pathology insights
Key Capabilities
| Capability | Description | |------------|-------------| | Population Health | Patient demographics, encounters, observations, medications, conditions | | Cancer Pathology | TNM staging, molecular markers (MSI, KRAS, BRAF, HER2, ER, PR) | | AI Extraction | Claude 3.5 Sonnet extracts structured data from unstructured pathology text | | Unified Agent | Single agent with 4 tools for structured queries and semantic search | | Real-time Streaming | HL7 messages via OpenFlow with custom NiFi processor |
Architecture
┌─────────────────────────────────────────┐ │ DATA SOURCES │ ├─────────────────────────────────────────┤ │ Synthea CSVs HL7 Messages PDFs │ └──────────┬──────────┬───────────┬───────┘ │ │ │ ┌──────────▼──────────▼───────────▼───────┐ │ BRONZE LAYER │ │ Landing tables for raw data │ └──────────┬──────────┬───────────────────┘ │ │ ┌──────────▼──────────▼───────────────────┐ │ SILVER LAYER │ │ Parsed HL7 segments, patient records │ └──────────┬──────────┬───────────────────┘ │ │ ┌──────────▼──────────▼───────────────────┐ │ GOLD LAYER │ │ Patient 360, Cancer Reports, Analytics │ └──────────┬──────────────────────────────┘ │ ┌─────────────────────────┼─────────────────────────┐ │ │ │ ┌──────▼──────┐ ┌───────▼───────┐ ┌───────▼───────┐ │ Semantic │ │ Cortex Search │ │ Streamlit │ │ View │ │ Services │ │ Dashboard │ └──────┬──────┘ └───────┬───────┘ └───────────────┘ │ │ └────────────┬────────────┘ │ ┌────────▼────────┐ │ HL7 PATHOLOGY │ │ INTELLIGENCE │ │ AGENT │ └─────────────────┘
Quick Start
Prerequisites
- Snowflake account with ACCOUNTADMIN access
- Cortex AI features enabled
- Snowflake Intelligence available
- (Optional) OpenFlow for HL7 streaming
Installation Steps
1. Clone or download this repository
2. Run infrastructure setup
-- Execute in order: @sql/01_setup_infrastructure.sql @sql/02_setup_pop_health_tables.sql @sql/03_setup_pathology_tables.sql @sql/04_setup_gold_analytics.sql @sql/05_setup_semantic_view.sql @sql/06_setup_cortex_search.sql @sql/07_setup_intelligence_agent.sql
3. Generate HL7 pathology data
python scripts/generate_hl7_for_synthea.py
4. Load data and run pipeline
@sql/08_run_pipeline.sql
5. Deploy Streamlit dashboard
- Upload
streamlit/HL7_PATHOLOGY_DASHBOARD.pyto Snowsight - Or run locally with Snowflake connection
Test the Agent
SELECT SNOWFLAKE.CORE.AGENT_QUERY( 'HL7_PATHOLOGY_AI.ANALYTICS.HL7_PATHOLOGY_AGENT', 'How many patients have cancer and what types?' );
Project Structure
unified-healthcare-intelligence-demo/ ├── README.md # This file ├── QUICKSTART.md # Detailed setup guide │ ├── sql/ │ ├── 01_setup_infrastructure.sql # Database, schemas, roles │ ├── 02_setup_pop_health_tables.sql # Population health tables │ ├── 03_setup_pathology_tables.sql # HL7 pathology pipeline │ ├── 04_setup_gold_analytics.sql # Unified analytics views │ ├── 05_setup_semantic_view.sql # Semantic view for Analyst │ ├── 06_setup_cortex_search.sql # Search services │ ├── 07_setup_intelligence_agent.sql # Unified agent │ └── 08_run_pipeline.sql # Data loading & execution │ ├── scripts/ │ └── generate_hl7_for_synthea.py # Generate HL7 messages │ ├── data/ │ ├── csv_files/ # Synthea patient data │ ├── hl7_messages/ # Generated HL7 JSON │ ├── Medical_Transcripts/ # Clinical note PDFs │ └── Pathology_Reports/ # Pathology report files │ ├── streamlit/ │ ├── HL7_PATHOLOGY_DASHBOARD.py │ └── environment.yml │ ├── openflow/ │ ├── hl7_processors-0.2.0.nar # Custom HL7 parser NAR │ └── Openflow-HL7-Flow.json # NiFi flow definition │ └── semantic_models/ └── unified_healthcare.yaml
Intelligence Agent Tools
The unified agent includes 4 specialized tools:
| Tool | Type | Use Case | |------|------|----------| | query_healthcare_data | Cortex Analyst | Structured queries on patients, encounters, pathology | | search_pathology_reports | Cortex Search | Find specific diagnoses, staging, findings in reports | | search_medical_transcripts | Cortex Search | Search clinical notes and discharge summaries | | search_conditions | Cortex Search | Population-wide diagnosis searches |
Sample Questions
- "How many patients have cancer and what types are represented?"
- "Show me all Stage III colorectal cancer patients"
- "Which patients have MSI-High status and are immunotherapy candidates?"
- "Compare healthcare costs between cancer and non-cancer patients"
- "Find pathology reports mentioning perineural invasion"
- "What is the distribution of HER2 status in breast cancer patients?"
Data Model
Population Health Schema
PATIENTS- Demographics, location, healthcare costsENCOUNTERS- Visits, costs, reasonsOBSERVATIONS- Vitals, lab resultsCONDITIONS- DiagnosesMEDICATIONS- PrescriptionsALLERGIES- Adverse reactionsIMMUNIZATIONS- Vaccinations
Cancer Pathology Schema
LANDING_HL7_MESSAGES- Raw HL7 JSON (Bronze)SILVER_*_SEGMENTS- Parsed HL7 segments (Silver)GOLD_CANCER_PATHOLOGY_REPORTS- AI-enriched reports (Gold)
Analytics Schema
PATIENT_360_VIEW- Unified patient profileCANCER_POPULATION_SUMMARY- Population cancer statsCANCER_MOLECULAR_PROFILE- Molecular marker analysisHL7_PATHOLOGY_SEMANTIC_VIEW- Semantic model for Analyst
Technologies
| Component | Technology |…
Excerpt shown — open the source for the full document.
Notability
notability 1.0/10Low stars, routine guide repo