Snowflake-Labs/sfguide-customer-issue-deduplication-demo
Jupyter Notebook
Captured source
source ↗Snowflake-Labs/sfguide-customer-issue-deduplication-demo
Description: Using Snowflake AISQL for entity identification and issue deduplication
Language: Jupyter Notebook
License: Apache-2.0
Stars: 2
Forks: 0
Open issues: 0
Created: 2025-10-17T02:14:14Z
Pushed: 2026-03-07T23:11:41Z
Default branch: main
Fork: no
Archived: no
README:
Smart Complaint deduplication using Snowflake-native AISQL
By Anant Damle & Nathan Birch
Story Link: https://medium.com/snowflake/smart-complaint-deduplication-using-snowflake-native-aisql-2bab5885e277
Customer issue management is an inevitable part of doing business in today’s world. Duplicate complaints remain a common challenge for organisations, often stemming from customers following up on similar issues through different channels and lack of reliable master account information, leading to frustrating experiences for both customers and support teams, and ultimately, skewed reporting.
Imagine a customer raising a complaint via a webform, then following up with a phone call about the same issue. Without a robust system, the on-call person might be unaware of the previous interaction, leading to redundant efforts and a lack of visibility into the true state of a customer's issue. The Challenge: unreliable matching, lost visibility and wasted resources.
Traditional methods often rely on name/entity resolution, extracting attributes like name, phone number, or email using regex or other forms of patterns and then match to related issues. However, these identifiers aren't always reliable. Account numbers or License Plate Numbers (LPNs) can be inconsistent, and even within a short timeframe (e.g., 5 days), the same issue might be treated as a new complaint. The result? Duplicate reports that obscure the real count of unique issues, hindering effective analysis and decision-making.
Architecture
A Snowflake-native solution designed to address these deduplication challenges, provides a clear and accurate picture of customer complaints.
---
config:
theme: 'default'
themeVariables:
darkMode: false
fontSize: 1.4rem
lineColor: blue
---
graph LR
%% --- Section: Raw ---
subgraph Raw
RSI[(Raw Support
Interactions)]
RMI[(Raw Master
Identities)]
end
%% --- Section: Enriched ---
subgraph enriched ["1. Enrichment"]
SE[SUMMARY +
ENTITY EXTRACT]
AIE1[AI_EMBED]
FX1{{fx}}
ESC[Enriched
Support Cases]
RSI -- Stream --> FX1
SE --> FX1
AIE1 -- Feature
Extraction --> FX1
FX1 --> ESC
AIE2[AI_EMBED]
FX2{{fx}}
EMI[Enriched
Master Identities]
RMI -- Stream --> FX2
AIE2 --> FX2
FX2 --> EMI
end
%% --- Section: Master Entity Resolution ---
subgraph entity_resolution ["2. Master Entity Resolution"]
JOIN(["X"])
CMA[Cases with
Master Account]
ESC --> JOIN
EMI --> JOIN
JOIN --> CMA
end
%% --- Section: Case Similarity Analysis ---
subgraph similarity_analysis ["3. Case Similarity Analysis"]
VS[Vector Similarity]
FX3{{fx}}
DC[Deduped
Cases]
CMA --> FX3
VS --> FX3
FX3 --> DC
end
%% --- Section: Automation ---
subgraph automation ["4. Automation"]
FX4{{fx}}
CRM[/CRM/ Case
Mgmt/]
DASH[Streamlit
Dashboard]
DC --> FX4
DC --> DASH
FX4 -- Update CRM --> CRM
end
%% --- Apply Class Styles to Nodes ---
classDef subgraphStyle fill:#99D1F5;
class RSI,RMI raw;
class SE,AIE1,AIE2,FX1,FX2,ESC,EMI,JOIN,CMA,VS,FX3,DC,FX4,CRM,DASH main;
class NOTE note;
class Raw,enriched,entity_resolution,similarity_analysis,automation subgraphStyle;1. Preprocessing/Enrichment: Extract entities and summarise the case using Snowflake Cortex 2. Master Entity Resolution: Associate cases to an account id through a combination of deterministic, fuzzy and vector matching techniques. 3. Similarity Analysis & Deduplication: Compares cases within the same master account to generate a similarity score and explanation, ensuring efficient and accurate deduplication. 4. Orchestration & Automation: You can automatically update CRM from Snowflake using stored procedures or a Streamlit-based dashboard to visualise the results. These will need to be built and are not included in the code artifacts.
Objectives
Key Snowflake AISQL Functions Covered:
`AI_COMPLETE` – Generates structured data from unstructured text using prompts, and provides content similarity analysis with explanations.
`AI_EMBED` – Converts text into vector embeddings, enabling semantic search and matching based on meaning.
`VECTOR_COSINE_SIMILARITY` – Compares vectors generated by AI\_EMBED to identify related content through semantic similarity.
`JAROWINKLER_SIMILARITY` – Performs fuzzy string matching, useful for handling names or identifiers with typos or slight variations.
Costs
This solution uses Snowflake Cortex AISQL functions which incur compute costs based on the number of tokens processed. Refer to the details for each function’s cost in credits.
Before you begin
Create a Snowflake notebook from the Git repository using these steps.
> Use the conversion script to convert the Snowflake notebook to a regular > Jupyter notebook with snowsql cell magic, using following command: > ``shell > python convert_snowflake_notebook.py ISSUE_DEDUP_NOTEBOOK.ipynb ISSUE_DEDUP_NOTEBOOK_snowsql.ipynb >
Preprocessing/Enrichment
The initial step involves extracting entities from raw support transcripts and summarizing them. This optimizes costs by reducing the number of tokens required for vectorization but also creates a valuable knowledge base for quick search capabilities. The solution leverages Snowflake Cortex's `AI_COMPLETE` function to extract specific information (sentiment, summary, contacts) from unstructured text, even for entities that don't follow strict patterns. This enriched data is then vectorized using…
Excerpt shown — open the source for the full document.
Notability
notability 3.0/10Routine demo repo with minimal traction