RepoMicrosoftMicrosoftpublished Mar 17, 2025seen 15h

microsoft/content-processing-solution-accelerator

Python

Open original ↗

Captured source

source ↗

microsoft/content-processing-solution-accelerator

Description: Programmatically extract data and apply schemas to unstructured documents across text-based and multi-modal content using Azure AI Foundry, Azure OpenAI, Azure AI Content Understanding, and Cosmos DB.

Language: Python

License: MIT

Stars: 226

Forks: 193

Open issues: 21

Created: 2025-03-17T16:10:24Z

Pushed: 2026-06-11T06:09:24Z

Default branch: main

Fork: no

Archived: no

README:

Content Processing Solution Accelerator

> [!WARNING] > Important Update > We've made major updates to Agentic Content Processing that include breaking changes. If you need the previous version, you can still find it in the v1 branch.

Process multi-document claims by extracting data from each document, applying schemas with confidence scoring, and generating AI-powered summaries and gap analysis across the entire claim. Upload multiple files — invoices, forms, images, contracts — to a single claim, and the solution automatically processes each document through a multi-modal content extraction pipeline, then orchestrates cross-document summarization and gap identification using an Agent Framework Workflow Engine.

The core content processing engine supports text, images, tables and graphs with schema-based transformation and confidence scoring. These capabilities can be applied to numerous use cases including: insurance claims processing, contract review, invoice processing, ID verification, and logistics shipment record processing.

---

[SOLUTION OVERVIEW](#solution-overview) \| [QUICK DEPLOY](#quick-deploy) \| [BUSINESS SCENARIO](#business-scenario) \| [SUPPORTING DOCUMENTATION](#supporting-documentation)

---

Note: With any AI solutions you create using these templates, you are responsible for assessing all associated risks and for complying with all applicable laws and safety standards. Learn more in the transparency documents for Agent Service and Agent Framework.

Solution overview

This accelerator leverages Azure AI Foundry, Azure AI Content Understanding Service, Azure OpenAI Service GPT-5.1, Azure Blob Storage, Azure Cosmos DB, and Azure Container Apps to process multi-document claims through a two-level architecture:

  • Claim Processing Workflow — Upload multiple documents to a claim via the Web UI. The Content Process Workflow (built on the Agent Framework Workflow Engine) orchestrates document extraction, AI-powered summarization, and gap analysis across all documents in the claim.
  • Content Processing Pipeline — The core engine (carried forward from v1) that processes each individual document through a 4-stage pipeline: Extract → Map → Evaluate → Save, with confidence scoring for extraction accuracy and schema mapping.

Processing, extraction, schema transformation, summarization, and gap analysis steps are tracked with status and scored for accuracy to automate processing and identify as-needed human validation.

Solution architecture

| ![image](./docs/images/readme/solution-architecture.png) | |---|

Click to view detailed architecture diagram

graph TB
subgraph UserInterface["🖥️ User Interface"]
WEB["Claim Process Monitor Web
(React / TypeScript / NGINX)
ca-name-web"]
end

subgraph API_Layer["🔗 Content Process API — Gateway"]
API["Content Process API
(FastAPI / Python)
ca-name-api"]
API_WF["/claimprocessor/*
Workflow Endpoints"]
API_CP["/contentprocessor/*
Content Processor Endpoints"]
API_SV["/schemasetvault/*
Schema Set Endpoints"]
API_SC["/schemavault/*
Schema Endpoints"]
API --- API_WF
API --- API_CP
API --- API_SV
API --- API_SC
end

subgraph Queues["📨 Azure Storage Queues"]
Q_CLAIM["claim-process-queue"]
Q_DLQ["claim-process-dead-letter-queue"]
Q_EXTRACT["content-pipeline-extract-queue"]
end

subgraph Workflow["⚙️ Content Process Workflow — Agent Framework"]
WF["Content Process Workflow
(Agent Framework Workflow Engine)
ca-name-wkfl"]
WF_S1["Stage 1: Document Processing
(DocumentProcessExecutor)
Invokes Content Processor per document"]
WF_S2["Stage 2: Summarizing
(SummarizeExecutor)
AI summary across all docs"]
WF_S3["Stage 3: Gap Analysis
(GapExecutor)
AI gap identification"]
WF --> WF_S1 --> WF_S2 --> WF_S3
end

subgraph Processor["📄 Content Processor — 4-Stage Pipeline"]
CP["Content Processor
(Python / Queue Worker)
ca-name-app"]
CP_E["1. Extract
(Azure AI Content Understanding)"]
CP_M["2. Map
(GPT-5.1 Vision)"]
CP_V["3. Evaluate
(Merge & Score)"]
CP_S["4. Save
(Blob + Cosmos DB)"]
CP --> CP_E --> CP_M --> CP_V --> CP_S
end

subgraph AzureAI["🧠 Azure AI Services"]
AICU["Azure AI Content
Understanding"]
AOAI["Azure OpenAI
GPT-5.1"]
end

subgraph DataStores["💾 Data & Storage"]
BLOB["Azure Blob Storage
Documents, Manifests, Results"]
COSMOS["Azure Cosmos DB
Processes | Schemas | claimprocesses"]
end

subgraph Config["🔧 Configuration & Infrastructure"]
APPCONFIG["App Configuration"]
ACR["Container Registry"]
CAE["Container App Environment"]
LOG["Log Analytics"]
end

%% Main flow
WEB -->|"HTTP"| API
API_WF -->|"enqueue claim"| Q_CLAIM
Q_CLAIM -->|"dequeue"| WF
WF_S1 -->|"HTTP per document"| API_CP
API_CP -->|"enqueue document"| Q_EXTRACT
Q_EXTRACT -->|"dequeue"| CP
WF -->|"failed messages"| Q_DLQ

%% AI service connections
CP_E -->|"OCR & layout"| AICU
CP_M -->|"vision extraction"| AOAI
WF_S2 -->|"summarization"| AOAI
WF_S3 -->|"gap analysis"| AOAI

%% Data store connections
CP_S -->|"save results"| BLOB
CP_S -->|"save results"| COSMOS
WF -->|"claim status & results"| COSMOS
WF_S1 -->|"download manifest"| BLOB
API -->|"read/write"| COSMOS
API -->|"read/write"| BLOB

%% Config connections
API -.->|"settings"| APPCONFIG
WF -.->|"settings"| APPCONFIG
CP -.->|"settings"| APPCONFIG
ACR -.->|"images"| CAE
CAE -.-> LOG

Agentic architecture

The claim processing workflow is built on the Agent Framework's Workflow Engine — a DAG-based event-streaming execution model that orchestrates specialized AI agents across the claim lifecycle. Each stage is an autonomous Executor that receives context, performs its task, and passes results downstream.

![image](./docs/images/readme/agentic-architecture.png)

Click to view detailed agentic architecture diagram

flowchart TB
subgraph...

Excerpt shown — open the source for the full document.