RepoSnowflake (Arctic)Snowflake (Arctic)published May 12, 2025seen 5d

Snowflake-Labs/sfguide-extracting-insights-from-video-with-multimodal-ai-analysis

Python

Open original ↗

Captured source

source ↗

Snowflake-Labs/sfguide-extracting-insights-from-video-with-multimodal-ai-analysis

Language: Python

License: Apache-2.0

Stars: 3

Forks: 7

Open issues: 0

Created: 2025-05-12T18:08:09Z

Pushed: 2026-06-02T09:05:54Z

Default branch: main

Fork: no

Archived: no

README:

Extracting Insights from Video with Multimodal AI Analysis

Overview

In this guide, we’ll take text-rich videos (instructional content, meetings) and extract still images and audio. In order to perform OCR and speech recognition using Whisper, we’ll process the images through Snowflake Cortex AI using PARSE_DOCUMENT and AI_TRANSCRIBE. To extract key moments and semantic events we will then process through Qwen2.5-VL on Snowpark Container Services (SPCS). Lastly, we will store the analysis from all three models into tables, and allow analytical queries around meeting productivity to be run on the data.

Step-by-Step Guide

For prerequisites, environment setup, step-by-step guide and instructions, please refer to the QuickStart Guide.

Dataset

This repository uses the AMI Meeting Corpus dataset:

  • Source: Edinburgh University (http://groups.inf.ed.ac.uk/ami/corpus/)
  • Citation: Carletta, J. et al. (2005). The AMI meeting corpus: A pre-announcement. In Proc. MLMI, pp. 28-39.
  • License: Creative Commons Attribution 4.0
  • Date Accessed: May 22, 2025

Notability

notability 1.0/10

Low-stars tutorial repo