Snowflake-Labs/sfguide-data-retention-and-purge
PLpgSQL
Captured source
source ↗Snowflake-Labs/sfguide-data-retention-and-purge
Language: PLpgSQL
License: Apache-2.0
Stars: 0
Forks: 0
Open issues: 0
Created: 2025-12-22T22:26:19Z
Pushed: 2025-12-22T22:36:37Z
Default branch: main
Fork: no
Archived: no
README:
Snowflake Guide: Data Retention and Purge
A metadata-driven framework for managing data retention, PII redaction and legal holds in Snowflake.
Overview
This repository contains SQL scripts and stored procedures to implement a comprehensive data lifecycle management system that supports:
- Retention Management: Automatically delete data based on configurable retention periods
- PII Redaction: Redact personally identifiable information (PII) before deletion
- Legal Holds: Prevent deletion of data subject to litigation, audits, or investigations
- Right to Forget: Support GDPR and privacy compliance with immediate PII redaction
Architecture
Metadata Tables
1. DATA_RETENTION_TABLE_INFO: Master table defining retention policies for each table
- Retention period (years)
- Redaction period (years)
- PII flags
- Reference join conditions for dependent tables
2. DATA_RETENTION_PII_COLUMNS: Defines PII columns and their redaction rules
- Column name
- PII type (NAME, ADDRESS, PHONE, etc.)
- Redaction rule (NULL, HASH, MASK, ENCRYPT)
3. DATA_RETENTION_HOLD_RECORDS: Tracks legal holds and retention exceptions
- Entity identification (CUSTOMER ID, etc.)
- Hold reason (Litigation, Regulatory, Investigation)
- Hold duration (start/end dates)
Key Concepts
Retention + Redaction Timeline
For tables with PII (e.g., CUSTOMER with 30-year retention + 2-year redaction):
0 years ←→ 30 years ←→ 32 years ←→ older [Keep as-is] [Redact PII] [Delete]
DELETE Cutoff = RETENTION + REDACTION years
This ensures: 1. Data is retained with PII for 30 years (retention period) 2. PII is redacted after 30 years and kept for 2 additional years (redaction period) 3. Total lifecycle = 32 years before deletion
Directory Structure
sfguide-data-retention-and-purge/ ├── sql/ │ ├── setup/ │ │ ├── 01_SETUP_ENVIRONMENT.sql │ │ ├── 02_LOAD_SAMPLE_METADATA.sql │ │ └── 99_CLEANUP.sql │ ├── metadata_tables/ │ │ ├── 01_DATA_RETENTION_TABLE_INFO.sql │ │ ├── 02_DATA_RETENTION_PII_COLUMNS.sql │ │ └── 03_DATA_RETENTION_HOLD_RECORDS.sql │ ├── stored_procedures/ │ │ ├── SP_REDACT_CUSTOMER_PII.sql │ │ └── TEST_SP_REDACT_CUSTOMER_PII.sql │ ├── storage_policies/ │ │ ├── 01_STORAGE_LIFECYCLE_POLICIES.sql │ │ ├── 02_APPLY_STORAGE_POLICIES.sql │ │ └── 03_MONITORING_AND_RETRIEVAL.sql │ ├── direct_delete/ │ │ └── DELETE_EXAMPLES.sql │ └── pii_and_holds/ │ ├── CUSTOMER_PII_REDACTION.sql │ └── LEGAL_HOLD_EXAMPLES.sql └── README.md
Setup Instructions
Run the setup scripts:
-- Step 1: Create environment (warehouse, database, schemas, TPCH data, metadata tables) @sql/setup/01_SETUP_ENVIRONMENT.sql -- Step 2: Load sample metadata @sql/setup/02_LOAD_SAMPLE_METADATA.sql
This creates:
- Database: DB_SAMPLE_SLP
- Schemas:
- GOVERNANCE (metadata tables)
- DATA_SLP (for Storage Lifecycle Policy approach)
- DATA_DELETE (for Direct Delete approach)
- Sample Data: TPCH tables in both DATA_SLP and DATA_DELETE schemas (~6M rows)
- Metadata: Pre-configured retention policies and PII column definitions
Cleanup
To remove all objects created by the setup:
-- Warning: This will permanently delete all data and objects @sql/setup/99_CLEANUP.sql
This drops:
- Database (DB_SAMPLE_SLP) with all schemas and tables
- Warehouse (DLM_WH)
- Storage lifecycle policies
- Stored procedures
Usage Examples
Storage lifecycle policies
Automated data lifecycle management using Snowflake's native storage policies:
See sql/storage_policies/ for:
01_STORAGE_LIFECYCLE_POLICIES.sql- Policy definitions02_APPLY_STORAGE_POLICIES.sql- Applying policies to tables03_MONITORING_AND_RETRIEVAL.sql- Monitoring and retrieving archived data
Data Deletion
Direct SQL examples for deleting old data based on retention policies:
See sql/direct_delete/DELETE_EXAMPLES.sql for ORDERS and LINEITEM deletion examples.
Legal Holds
Place, manage and release legal holds on customer data:
See sql/pii_and_holds/LEGAL_HOLD_EXAMPLES.sql for complete legal hold examples including:
- Checking for active holds
- Placing new holds
- Releasing holds
- Deletion with hold checks
PII Redaction
Use the stored procedure to redact PII for specific customers:
CALL SP_REDACT_CUSTOMER_PII('100,101,102,103,104');Storage lifecycle policies
Snowflake's native storage lifecycle policies provide automated, declarative data lifecycle management. These policies complement the stored procedure approach.
Policy Lifecycle
1. Active Data ↓ 2. Policy Evaluation (automatic) ↓ 3. Archive to COOL Storage (retrievable) ↓ 4. Wait ARCHIVE_FOR_DAYS (90 days) ↓ 5. Permanent Deletion (not retrievable)
See the following scripts for detailed examples:
sql/storage_policies/02_APPLY_STORAGE_POLICIES.sql- Applying policies to tablessql/storage_policies/03_MONITORING_AND_RETRIEVAL.sql- Monitoring and retrieving archived data
For complete documentation on Storage lifecycle policies, refer to Snowflake Documentation.
Sample Redaction Rules
| Rule | Description | Example | |------|-------------|---------| | NULL | Set column to NULL | C_ADDRESS = NULL | | REDACT | Anonymize with unique ID | C_NAME = 'REDACTED_' || C_CUSTKEY | | MASK | Show last 4 chars, mask rest with X | XXX-XXX-1234 | | ENCRYPT | Encrypt value (placeholder) | ***ENCRYPTED*** |
Sample Data (TPCH Database)
The PoC uses TPCH sample data in two separate schemas:
Schema Organization
| Schema | Purpose | Approach | |--------|---------|----------| | GOVERNANCE | Metadata tables | Shared by both approaches | | DATA_SLP | Sample data | Storage Lifecycle Policy (automated) | | DATA_DELETE | Sample data | Direct Delete (stored procedures) |
Table Configuration
| Table | Retention | Redaction | PII | Delete Cutoff | |-------|-----------|-----------|-----|---------------| | ORDERS | 30 years | - | No | 30 years | | LINEITEM | 30 years | - | No | 30 years | | CUSTOMER | 30 years | 2 years | Yes | 32 years |
Data Volume (TPCH_SF1):
- LINEITEM: ~6M rows
- ORDERS: ~1.5M rows
- CUSTOMER: ~150K rows
- Other tables:…
Excerpt shown — open the source for the full document.
Notability
notability 3.0/10Routine documentation repo, low traction.