Snowflake-Labs/sfguide-extracting-insights-from-video-with-multimodal-ai-analysis
Python
Captured source
source ↗Snowflake-Labs/sfguide-extracting-insights-from-video-with-multimodal-ai-analysis
Language: Python
License: Apache-2.0
Stars: 3
Forks: 7
Open issues: 0
Created: 2025-05-12T18:08:09Z
Pushed: 2026-06-02T09:05:54Z
Default branch: main
Fork: no
Archived: no
README:
Extracting Insights from Video with Multimodal AI Analysis
Overview
In this guide, we’ll take text-rich videos (instructional content, meetings) and extract still images and audio. In order to perform OCR and speech recognition using Whisper, we’ll process the images through Snowflake Cortex AI using PARSE_DOCUMENT and AI_TRANSCRIBE. To extract key moments and semantic events we will then process through Qwen2.5-VL on Snowpark Container Services (SPCS). Lastly, we will store the analysis from all three models into tables, and allow analytical queries around meeting productivity to be run on the data.
Step-by-Step Guide
For prerequisites, environment setup, step-by-step guide and instructions, please refer to the QuickStart Guide.
Dataset
This repository uses the AMI Meeting Corpus dataset:
- Source: Edinburgh University (http://groups.inf.ed.ac.uk/ami/corpus/)
- Citation: Carletta, J. et al. (2005). The AMI meeting corpus: A pre-announcement. In Proc. MLMI, pp. 28-39.
- License: Creative Commons Attribution 4.0
- Date Accessed: May 22, 2025
Notability
notability 1.0/10Low-stars tutorial repo