microsoft/content-processing-solution-accelerator
Python
Captured source
source ↗microsoft/content-processing-solution-accelerator
Description: Programmatically extract data and apply schemas to unstructured documents across text-based and multi-modal content using Azure AI Foundry, Azure OpenAI, Azure AI Content Understanding, and Cosmos DB.
Language: Python
License: MIT
Stars: 226
Forks: 193
Open issues: 21
Created: 2025-03-17T16:10:24Z
Pushed: 2026-06-11T06:09:24Z
Default branch: main
Fork: no
Archived: no
README:
Content Processing Solution Accelerator
> [!WARNING] > Important Update > We've made major updates to Agentic Content Processing that include breaking changes. If you need the previous version, you can still find it in the v1 branch.
Process multi-document claims by extracting data from each document, applying schemas with confidence scoring, and generating AI-powered summaries and gap analysis across the entire claim. Upload multiple files — invoices, forms, images, contracts — to a single claim, and the solution automatically processes each document through a multi-modal content extraction pipeline, then orchestrates cross-document summarization and gap identification using an Agent Framework Workflow Engine.
The core content processing engine supports text, images, tables and graphs with schema-based transformation and confidence scoring. These capabilities can be applied to numerous use cases including: insurance claims processing, contract review, invoice processing, ID verification, and logistics shipment record processing.
---
[SOLUTION OVERVIEW](#solution-overview) \| [QUICK DEPLOY](#quick-deploy) \| [BUSINESS SCENARIO](#business-scenario) \| [SUPPORTING DOCUMENTATION](#supporting-documentation)
---
Note: With any AI solutions you create using these templates, you are responsible for assessing all associated risks and for complying with all applicable laws and safety standards. Learn more in the transparency documents for Agent Service and Agent Framework.
Solution overview
This accelerator leverages Azure AI Foundry, Azure AI Content Understanding Service, Azure OpenAI Service GPT-5.1, Azure Blob Storage, Azure Cosmos DB, and Azure Container Apps to process multi-document claims through a two-level architecture:
- Claim Processing Workflow — Upload multiple documents to a claim via the Web UI. The Content Process Workflow (built on the Agent Framework Workflow Engine) orchestrates document extraction, AI-powered summarization, and gap analysis across all documents in the claim.
- Content Processing Pipeline — The core engine (carried forward from v1) that processes each individual document through a 4-stage pipeline: Extract → Map → Evaluate → Save, with confidence scoring for extraction accuracy and schema mapping.
Processing, extraction, schema transformation, summarization, and gap analysis steps are tracked with status and scored for accuracy to automate processing and identify as-needed human validation.
Solution architecture
|  | |---|
Click to view detailed architecture diagram
graph TB subgraph UserInterface["🖥️ User Interface"] WEB["Claim Process Monitor Web (React / TypeScript / NGINX) ca-name-web"] end subgraph API_Layer["🔗 Content Process API — Gateway"] API["Content Process API (FastAPI / Python) ca-name-api"] API_WF["/claimprocessor/* Workflow Endpoints"] API_CP["/contentprocessor/* Content Processor Endpoints"] API_SV["/schemasetvault/* Schema Set Endpoints"] API_SC["/schemavault/* Schema Endpoints"] API --- API_WF API --- API_CP API --- API_SV API --- API_SC end subgraph Queues["📨 Azure Storage Queues"] Q_CLAIM["claim-process-queue"] Q_DLQ["claim-process-dead-letter-queue"] Q_EXTRACT["content-pipeline-extract-queue"] end subgraph Workflow["⚙️ Content Process Workflow — Agent Framework"] WF["Content Process Workflow (Agent Framework Workflow Engine) ca-name-wkfl"] WF_S1["Stage 1: Document Processing (DocumentProcessExecutor) Invokes Content Processor per document"] WF_S2["Stage 2: Summarizing (SummarizeExecutor) AI summary across all docs"] WF_S3["Stage 3: Gap Analysis (GapExecutor) AI gap identification"] WF --> WF_S1 --> WF_S2 --> WF_S3 end subgraph Processor["📄 Content Processor — 4-Stage Pipeline"] CP["Content Processor (Python / Queue Worker) ca-name-app"] CP_E["1. Extract (Azure AI Content Understanding)"] CP_M["2. Map (GPT-5.1 Vision)"] CP_V["3. Evaluate (Merge & Score)"] CP_S["4. Save (Blob + Cosmos DB)"] CP --> CP_E --> CP_M --> CP_V --> CP_S end subgraph AzureAI["🧠 Azure AI Services"] AICU["Azure AI Content Understanding"] AOAI["Azure OpenAI GPT-5.1"] end subgraph DataStores["💾 Data & Storage"] BLOB["Azure Blob Storage Documents, Manifests, Results"] COSMOS["Azure Cosmos DB Processes | Schemas | claimprocesses"] end subgraph Config["🔧 Configuration & Infrastructure"] APPCONFIG["App Configuration"] ACR["Container Registry"] CAE["Container App Environment"] LOG["Log Analytics"] end %% Main flow WEB -->|"HTTP"| API API_WF -->|"enqueue claim"| Q_CLAIM Q_CLAIM -->|"dequeue"| WF WF_S1 -->|"HTTP per document"| API_CP API_CP -->|"enqueue document"| Q_EXTRACT Q_EXTRACT -->|"dequeue"| CP WF -->|"failed messages"| Q_DLQ %% AI service connections CP_E -->|"OCR & layout"| AICU CP_M -->|"vision extraction"| AOAI WF_S2 -->|"summarization"| AOAI WF_S3 -->|"gap analysis"| AOAI %% Data store connections CP_S -->|"save results"| BLOB CP_S -->|"save results"| COSMOS WF -->|"claim status & results"| COSMOS WF_S1 -->|"download manifest"| BLOB API -->|"read/write"| COSMOS API -->|"read/write"| BLOB %% Config connections API -.->|"settings"| APPCONFIG WF -.->|"settings"| APPCONFIG CP -.->|"settings"| APPCONFIG ACR -.->|"images"| CAE CAE -.-> LOG
Agentic architecture
The claim processing workflow is built on the Agent Framework's Workflow Engine — a DAG-based event-streaming execution model that orchestrates specialized AI agents across the claim lifecycle. Each stage is an autonomous Executor that receives context, performs its task, and passes results downstream.

Click to view detailed agentic architecture diagram
flowchart TB subgraph...
Excerpt shown — open the source for the full document.