RepoSnowflake (Arctic)Snowflake (Arctic)published Dec 22, 2025seen 5d

Snowflake-Labs/sfguide-data-retention-and-purge

PLpgSQL

Open original ↗

Captured source

source ↗

Snowflake-Labs/sfguide-data-retention-and-purge

Language: PLpgSQL

License: Apache-2.0

Stars: 0

Forks: 0

Open issues: 0

Created: 2025-12-22T22:26:19Z

Pushed: 2025-12-22T22:36:37Z

Default branch: main

Fork: no

Archived: no

README:

Snowflake Guide: Data Retention and Purge

A metadata-driven framework for managing data retention, PII redaction and legal holds in Snowflake.

Overview

This repository contains SQL scripts and stored procedures to implement a comprehensive data lifecycle management system that supports:

  • Retention Management: Automatically delete data based on configurable retention periods
  • PII Redaction: Redact personally identifiable information (PII) before deletion
  • Legal Holds: Prevent deletion of data subject to litigation, audits, or investigations
  • Right to Forget: Support GDPR and privacy compliance with immediate PII redaction

Architecture

Metadata Tables

1. DATA_RETENTION_TABLE_INFO: Master table defining retention policies for each table

  • Retention period (years)
  • Redaction period (years)
  • PII flags
  • Reference join conditions for dependent tables

2. DATA_RETENTION_PII_COLUMNS: Defines PII columns and their redaction rules

  • Column name
  • PII type (NAME, ADDRESS, PHONE, etc.)
  • Redaction rule (NULL, HASH, MASK, ENCRYPT)

3. DATA_RETENTION_HOLD_RECORDS: Tracks legal holds and retention exceptions

  • Entity identification (CUSTOMER ID, etc.)
  • Hold reason (Litigation, Regulatory, Investigation)
  • Hold duration (start/end dates)

Key Concepts

Retention + Redaction Timeline

For tables with PII (e.g., CUSTOMER with 30-year retention + 2-year redaction):

0 years ←→ 30 years ←→ 32 years ←→ older
[Keep as-is] [Redact PII] [Delete]

DELETE Cutoff = RETENTION + REDACTION years

This ensures: 1. Data is retained with PII for 30 years (retention period) 2. PII is redacted after 30 years and kept for 2 additional years (redaction period) 3. Total lifecycle = 32 years before deletion

Directory Structure

sfguide-data-retention-and-purge/
├── sql/
│ ├── setup/
│ │ ├── 01_SETUP_ENVIRONMENT.sql
│ │ ├── 02_LOAD_SAMPLE_METADATA.sql
│ │ └── 99_CLEANUP.sql
│ ├── metadata_tables/
│ │ ├── 01_DATA_RETENTION_TABLE_INFO.sql
│ │ ├── 02_DATA_RETENTION_PII_COLUMNS.sql
│ │ └── 03_DATA_RETENTION_HOLD_RECORDS.sql
│ ├── stored_procedures/
│ │ ├── SP_REDACT_CUSTOMER_PII.sql
│ │ └── TEST_SP_REDACT_CUSTOMER_PII.sql
│ ├── storage_policies/
│ │ ├── 01_STORAGE_LIFECYCLE_POLICIES.sql
│ │ ├── 02_APPLY_STORAGE_POLICIES.sql
│ │ └── 03_MONITORING_AND_RETRIEVAL.sql
│ ├── direct_delete/
│ │ └── DELETE_EXAMPLES.sql
│ └── pii_and_holds/
│ ├── CUSTOMER_PII_REDACTION.sql
│ └── LEGAL_HOLD_EXAMPLES.sql
└── README.md

Setup Instructions

Run the setup scripts:

-- Step 1: Create environment (warehouse, database, schemas, TPCH data, metadata tables)
@sql/setup/01_SETUP_ENVIRONMENT.sql

-- Step 2: Load sample metadata
@sql/setup/02_LOAD_SAMPLE_METADATA.sql

This creates:

  • Database: DB_SAMPLE_SLP
  • Schemas:
  • GOVERNANCE (metadata tables)
  • DATA_SLP (for Storage Lifecycle Policy approach)
  • DATA_DELETE (for Direct Delete approach)
  • Sample Data: TPCH tables in both DATA_SLP and DATA_DELETE schemas (~6M rows)
  • Metadata: Pre-configured retention policies and PII column definitions

Cleanup

To remove all objects created by the setup:

-- Warning: This will permanently delete all data and objects
@sql/setup/99_CLEANUP.sql

This drops:

  • Database (DB_SAMPLE_SLP) with all schemas and tables
  • Warehouse (DLM_WH)
  • Storage lifecycle policies
  • Stored procedures

Usage Examples

Storage lifecycle policies

Automated data lifecycle management using Snowflake's native storage policies:

See sql/storage_policies/ for:

  • 01_STORAGE_LIFECYCLE_POLICIES.sql - Policy definitions
  • 02_APPLY_STORAGE_POLICIES.sql - Applying policies to tables
  • 03_MONITORING_AND_RETRIEVAL.sql - Monitoring and retrieving archived data

Data Deletion

Direct SQL examples for deleting old data based on retention policies:

See sql/direct_delete/DELETE_EXAMPLES.sql for ORDERS and LINEITEM deletion examples.

Legal Holds

Place, manage and release legal holds on customer data:

See sql/pii_and_holds/LEGAL_HOLD_EXAMPLES.sql for complete legal hold examples including:

  • Checking for active holds
  • Placing new holds
  • Releasing holds
  • Deletion with hold checks

PII Redaction

Use the stored procedure to redact PII for specific customers:

CALL SP_REDACT_CUSTOMER_PII('100,101,102,103,104');

Storage lifecycle policies

Snowflake's native storage lifecycle policies provide automated, declarative data lifecycle management. These policies complement the stored procedure approach.

Policy Lifecycle

1. Active Data
↓
2. Policy Evaluation (automatic)
↓
3. Archive to COOL Storage (retrievable)
↓
4. Wait ARCHIVE_FOR_DAYS (90 days)
↓
5. Permanent Deletion (not retrievable)

See the following scripts for detailed examples:

  • sql/storage_policies/02_APPLY_STORAGE_POLICIES.sql - Applying policies to tables
  • sql/storage_policies/03_MONITORING_AND_RETRIEVAL.sql - Monitoring and retrieving archived data

For complete documentation on Storage lifecycle policies, refer to Snowflake Documentation.

Sample Redaction Rules

| Rule | Description | Example | |------|-------------|---------| | NULL | Set column to NULL | C_ADDRESS = NULL | | REDACT | Anonymize with unique ID | C_NAME = 'REDACTED_' || C_CUSTKEY | | MASK | Show last 4 chars, mask rest with X | XXX-XXX-1234 | | ENCRYPT | Encrypt value (placeholder) | ***ENCRYPTED*** |

Sample Data (TPCH Database)

The PoC uses TPCH sample data in two separate schemas:

Schema Organization

| Schema | Purpose | Approach | |--------|---------|----------| | GOVERNANCE | Metadata tables | Shared by both approaches | | DATA_SLP | Sample data | Storage Lifecycle Policy (automated) | | DATA_DELETE | Sample data | Direct Delete (stored procedures) |

Table Configuration

| Table | Retention | Redaction | PII | Delete Cutoff | |-------|-----------|-----------|-----|---------------| | ORDERS | 30 years | - | No | 30 years | | LINEITEM | 30 years | - | No | 30 years | | CUSTOMER | 30 years | 2 years | Yes | 32 years |

Data Volume (TPCH_SF1):

  • LINEITEM: ~6M rows
  • ORDERS: ~1.5M rows
  • CUSTOMER: ~150K rows
  • Other tables:…

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Routine documentation repo, low traction.