Snowflake-Labs/ai-ready-data
Captured source
source ↗Snowflake-Labs/ai-ready-data
Description: Framework to define AI-ready data and Skills for coding agents to assess and make your data AI-ready
License: Apache-2.0
Stars: 73
Forks: 16
Open issues: 2
Created: 2026-03-02T16:45:18Z
Pushed: 2026-05-11T16:09:19Z
Default branch: main
Fork: no
Archived: no
README:
The AI-Ready Data Framework
Introduction
The AI-Ready Data Framework is an open standard that defines what "AI-ready" actually means — six factors, 62 measurable requirements, and five workload profiles that apply to any data platform.
The AI-Ready Data Agent is an installable skill that puts the framework to work. Point it at your data estate, tell it your workload — RAG, agents, feature serving, or training — and it scores every requirement, surfaces gaps, and guides you through remediation.
The framework defines what to measure. The agent measures it and fixes what doesn't pass.
Background
The contributors to this framework include practicing data engineers, ML engineers, and platform architects who have built and operated AI systems across industries.
This repo synthesizes our collective experience building data infrastructure that can reliably power AI. Our goal is to help data practitioners design infrastructure that produces trustworthy AI decisions.
Who should use this repo?
- Data engineers building pipelines that power AI systems.
- Platform teams designing infrastructure for ML and AI workloads.
- Architects evaluating whether their stack can support RAG, agents, or real-time inference.
- Data leaders who need to assess organizational AI readiness and communicate gaps to their teams.
- Coding agents building the data infrastructure they will eventually consume.
The Six Factors of AI-Ready Data
1. [Clean](factors/0-clean.md): Clean data is consistently accurate, complete, valid, and free of errors that would compromise downstream consumption. 2. [Contextual](factors/1-contextual.md): Meaning is explicit and colocated with the data. No external lookup, tribal knowledge, or human context is required to take action on the data. 3. [Consumable](factors/2-consumable.md): Data is served in the right format and at the right latencies for AI workloads. 4. [Current](factors/3-current.md): Data reflects the present state, with freshness enforced by infrastructure rather than assumed by convention. 5. [Correlated](factors/4-correlated.md): Data is traceable from source to every decision it informs. 6. [Compliant](factors/5-compliant.md): Data is governed with explicit ownership, enforced access boundaries, and AI-specific safeguards.
These factors apply to any data system powering AI applications, regardless of tech stack.
Requirements
Each factor is backed by a set of measurable requirements — specific, platform-agnostic criteria that define what must be true of your data. Requirements describe the *what*, not the *how*. The full canonical list lives in the [skill manifest](skills/ai-ready-data/requirements/requirements.yaml).
The factor markdown files above describe the *why* and *what* of each factor in prose. The manifest provides the machine-readable counterpart: every requirement has a unique key, a description, a factor, and a scope (schema, table, or column). All tests return a normalized score between 0 and 1, making it straightforward to build automated assessments or dashboards on top of the framework.
AI-Ready Data Agent (Skill)
An installable skill for coding agents. Use it to scan your data estate for prioritization, assess specific assets against a profile, and get a scored report across the six factors of AI-ready data with guided remediation.
Currently supports Snowflake. Looking for contributions to extend to other platforms.
Quick Start
Install as a skill
npx skills add Snowflake-Labs/ai-ready-data -a cortex
Standalone
Clone or add this repo as workspace context. The agent reads skills/ai-ready-data/SKILL.md automatically.
Start assessment
After installing, ask your coding agent:
> Assess my [data assets] for RAG readiness.
The agent asks your platform and scope, loads the RAG profile, runs checks, and presents a scored report. From there you can drill into failures and remediate stage-by-stage.
For estate-level prioritization:
> Scan my data estate for AI readiness.
The agent sweeps across all schemas in a database with lightweight readiness proxies and presents a comparative ranking.
How It Works
Three phases, from light to deep: Scan, Assess, Remediate.
1. Choose a platform: Snowflake, Postgres, etc 2. Discovery: tell the agent your database, schema, and tables, or scan your entire estate 3. Choose a profile: RAG, feature serving, training, agents, full assessment, or pick specific requirements 4. Adjust: skip, set, or add requirements before running 5. Coverage: see what's runnable on your platform before executing 6. Assess: platform-specific checks score each requirement 0–1 7. Remediate: for failures, the agent presents platform-specific fixes for your approval
Factor Stages
Every assessment is organized into six stages, one per factor of AI-ready data:
| Factor | Example Requirements | | -------------- | ------------------------------------------------------------------------------------- | | Clean | data_completeness, uniqueness, referential_integrity | | Contextual | semantic_documentation, relationship_declaration, entity_identifier_declaration | | Consumable | embedding_coverage, vector_index_coverage, serving_latency_compliance | | Current | change_detection, data_freshness, incremental_update_coverage | | Correlated | data_provenance, lineage_completeness, agent_attribution | | Compliant | classification, column_masking, access_audit_coverage |
All scores are 0–1 where 1.0 is perfect. Requirements pass when score >= threshold.
Built-In Profiles
| Profile | Requirements | Best For | | ------------------- | ------------ | -------------------------------------------------------------------------------------------- | | scan | 8 | Estate-level sweep: lightweight readiness proxies for portfolio analysis and prioritization | | rag | 27 | Retrieval-augmented generation: chunking, embeddings, vector search, document governance | | feature-serving | 39 | Online feature stores:…
Excerpt shown — open the source for the full document.
Notability
notability 4.0/10New repo with 73 stars, moderate traction