RepoSnowflake (Arctic)Snowflake (Arctic)published Oct 17, 2025seen 5d

Snowflake-Labs/sfguide-customer-issue-deduplication-demo

Jupyter Notebook

Open original ↗

Captured source

source ↗

Snowflake-Labs/sfguide-customer-issue-deduplication-demo

Description: Using Snowflake AISQL for entity identification and issue deduplication

Language: Jupyter Notebook

License: Apache-2.0

Stars: 2

Forks: 0

Open issues: 0

Created: 2025-10-17T02:14:14Z

Pushed: 2026-03-07T23:11:41Z

Default branch: main

Fork: no

Archived: no

README:

Smart Complaint deduplication using Snowflake-native AISQL

By Anant Damle & Nathan Birch

Story Link: https://medium.com/snowflake/smart-complaint-deduplication-using-snowflake-native-aisql-2bab5885e277

Customer issue management is an inevitable part of doing business in today’s world. Duplicate complaints remain a common challenge for organisations, often stemming from customers following up on similar issues through different channels and lack of reliable master account information, leading to frustrating experiences for both customers and support teams, and ultimately, skewed reporting.

Imagine a customer raising a complaint via a webform, then following up with a phone call about the same issue. Without a robust system, the on-call person might be unaware of the previous interaction, leading to redundant efforts and a lack of visibility into the true state of a customer's issue. The Challenge: unreliable matching, lost visibility and wasted resources.

Traditional methods often rely on name/entity resolution, extracting attributes like name, phone number, or email using regex or other forms of patterns and then match to related issues. However, these identifiers aren't always reliable. Account numbers or License Plate Numbers (LPNs) can be inconsistent, and even within a short timeframe (e.g., 5 days), the same issue might be treated as a new complaint. The result? Duplicate reports that obscure the real count of unique issues, hindering effective analysis and decision-making.

Architecture

A Snowflake-native solution designed to address these deduplication challenges, provides a clear and accurate picture of customer complaints.

---
config:
theme: 'default'
themeVariables:
darkMode: false
fontSize: 1.4rem
lineColor: blue
---
graph LR
%% --- Section: Raw ---
subgraph Raw
RSI[(Raw Support
Interactions)]
RMI[(Raw Master
Identities)]

end

%% --- Section: Enriched ---
subgraph enriched ["1. Enrichment"]
SE[SUMMARY +
ENTITY EXTRACT]
AIE1[AI_EMBED]
FX1{{fx}}
ESC[Enriched
Support Cases]

RSI -- Stream --> FX1
SE --> FX1
AIE1 -- Feature
Extraction --> FX1
FX1 --> ESC

AIE2[AI_EMBED]
FX2{{fx}}
EMI[Enriched
Master Identities]

RMI -- Stream --> FX2
AIE2 --> FX2
FX2 --> EMI
end

%% --- Section: Master Entity Resolution ---
subgraph entity_resolution ["2. Master Entity Resolution"]
JOIN(["X"])
CMA[Cases with
Master Account]

ESC --> JOIN
EMI --> JOIN
JOIN --> CMA
end

%% --- Section: Case Similarity Analysis ---
subgraph similarity_analysis ["3. Case Similarity Analysis"]
VS[Vector Similarity]
FX3{{fx}}
DC[Deduped
Cases]

CMA --> FX3
VS --> FX3
FX3 --> DC
end

%% --- Section: Automation ---
subgraph automation ["4. Automation"]
FX4{{fx}}
CRM[/CRM/ Case
Mgmt/]
DASH[Streamlit
Dashboard]

DC --> FX4
DC --> DASH
FX4 -- Update CRM --> CRM
end

%% --- Apply Class Styles to Nodes ---
classDef subgraphStyle fill:#99D1F5;

class RSI,RMI raw;
class SE,AIE1,AIE2,FX1,FX2,ESC,EMI,JOIN,CMA,VS,FX3,DC,FX4,CRM,DASH main;
class NOTE note;
class Raw,enriched,entity_resolution,similarity_analysis,automation subgraphStyle;

1. Preprocessing/Enrichment: Extract entities and summarise the case using Snowflake Cortex 2. Master Entity Resolution: Associate cases to an account id through a combination of deterministic, fuzzy and vector matching techniques. 3. Similarity Analysis & Deduplication: Compares cases within the same master account to generate a similarity score and explanation, ensuring efficient and accurate deduplication. 4. Orchestration & Automation: You can automatically update CRM from Snowflake using stored procedures or a Streamlit-based dashboard to visualise the results. These will need to be built and are not included in the code artifacts.

Objectives

Key Snowflake AISQL Functions Covered:

`AI_COMPLETE` – Generates structured data from unstructured text using prompts, and provides content similarity analysis with explanations.

`AI_EMBED` – Converts text into vector embeddings, enabling semantic search and matching based on meaning.

`VECTOR_COSINE_SIMILARITY` – Compares vectors generated by AI\_EMBED to identify related content through semantic similarity.

`JAROWINKLER_SIMILARITY` – Performs fuzzy string matching, useful for handling names or identifiers with typos or slight variations.

Costs

This solution uses Snowflake Cortex AISQL functions which incur compute costs based on the number of tokens processed. Refer to the details for each function’s cost in credits.

Before you begin

Create a Snowflake notebook from the Git repository using these steps.

> Use the conversion script to convert the Snowflake notebook to a regular > Jupyter notebook with snowsql cell magic, using following command: > ``shell > python convert_snowflake_notebook.py ISSUE_DEDUP_NOTEBOOK.ipynb ISSUE_DEDUP_NOTEBOOK_snowsql.ipynb >

Preprocessing/Enrichment

The initial step involves extracting entities from raw support transcripts and summarizing them. This optimizes costs by reducing the number of tokens required for vectorization but also creates a valuable knowledge base for quick search capabilities. The solution leverages Snowflake Cortex's `AI_COMPLETE` function to extract specific information (sentiment, summary, contacts) from unstructured text, even for entities that don't follow strict patterns. This enriched data is then vectorized using…

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Routine demo repo with minimal traction