RepoBaidu (ERNIE)Baidu (ERNIE)published May 8, 2020seen 5d

PaddlePaddle/PaddleOCR

Python

Open original ↗

Captured source

source ↗
published May 8, 2020seen 5dcaptured 10hhttp 200method plain

PaddlePaddle/PaddleOCR

Description: Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

Language: Python

License: Apache-2.0

Stars: 81778

Forks: 10723

Open issues: 214

Created: 2020-05-08T10:38:16Z

Pushed: 2026-06-11T03:52:05Z

Default branch: main

Fork: no

Archived: no

README:

PaddleOCR converts PDF documents and images into structured, LLM-ready data (JSON/Markdown) with industry-leading accuracy. With 70k+ Stars and trusted by top-tier projects like Dify, RAGFlow, and Cherry Studio, PaddleOCR is the bedrock for building intelligent RAG and Agentic applications.

🚀 Key Features

📄 Intelligent Document Parsing (LLM-Ready)

> *Transforming messy visuals into structured data for the LLM era.*

  • SOTA Document VLM: Featuring PaddleOCR-VL-1.6 (0.9B), the industry's leading lightweight vision-language model for document parsing. It achieves 96.3% accuracy on OmniDocBench v1.6, leads in text, formula, and table recognition, and shows significantly enhanced capabilities in ancient documents, rare characters, seals, and charts, with structured outputs in Markdown and JSON formats.
  • Structure-Aware Conversion: Powered by PP-StructureV3, seamlessly convert complex PDFs and images into Markdown or JSON. Unlike the PaddleOCR-VL series models, it provides more fine-grained coordinate information, including table cell coordinates, text coordinates, and more.
  • Production-Ready Efficiency: Achieve commercial-grade accuracy with an ultra-small footprint. Outperforms numerous closed-source solutions in public benchmarks while remaining resource-efficient for edge/cloud deployment.

🔍 Universal Text Recognition (Scene OCR)

> *The global gold standard for high-speed, multilingual text spotting.*

  • 100+ Languages Supported: Native recognition for a vast global library. Our PP-OCRv5 single-model solution elegantly handles multilingual mixed documents (Chinese, English, Japanese, Pinyin, etc.).
  • Complex Element Mastery: Beyond standard text recognition, we support natural scene text spotting across a wide range of environments, including IDs, street views, books, and industrial components
  • Performance Leap: PP-OCRv5 delivers a 13% accuracy boost over previous versions, maintaining the "Extreme Efficiency" that PaddleOCR is famous for.

🛠️ Developer-Centric Ecosystem

  • Seamless Integration: The premier choice for the AI Agent ecosystem—deeply integrated with Dify, RAGFlow, Pathway, and Cherry Studio.
  • LLM Data Flywheel: A complete pipeline to build high-quality datasets, providing a sustainable "Data Engine" for fine-tuning Large Language Models.
  • One-Click Deployment: Supports various hardware backends (NVIDIA GPU, Intel CPU, Kunlunxin XPU, and diverse AI Accelerators).

📣 Recent updates

🔥 2026.05.28: Release of PaddleOCR 3.6.0

  • PaddleOCR-VL-1.6 highlights:
  • New SOTA Accuracy: Achieves over 96.3% on OmniDocBench v1.6, also sets new SOTA on OmniDocBench v1.5 and Real5-OmniDocBench, leading both open-source and proprietary solutions in text, formula, and table recognition.
  • Comprehensive Capability Upgrade: Significant improvements in table, ancient document, and rare character recognition, with notably enhanced seal recognition, spotting, and chart understanding across multiple scenarios.
  • Seamless Migration: Model architecture is fully consistent with PaddleOCR-VL-1.5, enabling zero-cost adaptation—swap and go.
  • Try it now: Available on HuggingFace or our Official Website.

2026.04.21: Release of PaddleOCR 3.5.0

  • Flexible inference backends: Seamlessly switch between Paddle static graph, Paddle dynamic graph, or Transformers. PaddleOCR is now deeply integrated with the Hugging Face ecosystem, and 20 major models support Transformers as the inference backend.
  • Office documents to Markdown: Convert common document formats such as Word, Excel, and PowerPoint into Markdown.
  • DOCX export for parsed results: The PaddleOCR-VL series, PP-StructureV3, and PP-DocTranslation now support exporting parsed results to DOCX for convenient viewing and editing in Microsoft Word.
  • Official browser inference SDK: Released PaddleOCR.js, the official browser inference SDK that supports running PP-OCRv5 directly in the browser.

2026.01.29: Release of PaddleOCR 3.4.0

  • PaddleOCR-VL-1.5 (SOTA 0.9B VLM): Our latest flagship model for document parsing is now live!
  • 94.5% Accuracy on OmniDocBench: Surpassing top-tier general large models and specialized document parsers.
  • Real-World Robustness: First to introduce the PP-DocLayoutV3 algorithm for irregular shape positioning, mastering 5 tough scenarios: *Skew, Warping, Scanning, Illumination, and Screen Photography*.
  • Capability Expansion: Now supports Seal Recognition, Text Spotting, and expands to 111 languages (including China’s Tibetan script and Bengali).
  • Long Document Mastery: Supports automatic cross-page table merging and hierarchical heading identification.
  • Try it now: Available on HuggingFace or our Official Website.

2025.10.16: Release of PaddleOCR 3.3.0

  • Released PaddleOCR-VL:
  • Model Introduction:
  • PaddleOCR-VL is a SOTA and resource-efficient model tailored for document parsing. Its core component is PaddleOCR-VL-0.9B, a compact yet powerful vision-language model (VLM) that integrates a NaViT-style dynamic resolution visual encoder with the ERNIE-4.5-0.3B language model to enable accurate element recognition. This innovative model efficiently supports 109 languages and excels in recognizing complex elements (e.g., text, tables, formulas, and charts), while maintaining minimal resource consumption. Through comprehensive evaluations on widely used public benchmarks and in-house benchmarks, PaddleOCR-VL achieves SOTA performance in both page-level document parsing and element-level recognition. It significantly outperforms existing solutions, exhibits strong competitiveness against top-tier VLMs, and delivers fast inference speeds. These strengths make it highly suitable for practical deployment in real-world scenarios. The model has…

Excerpt shown — open the source for the full document.

Notability

Scored, but no written rationale attached yet.

Baidu (ERNIE) has a repo signal matching data demand, product and customer.