zai-org/GLM-OCR
Python
Captured source
source ↗zai-org/GLM-OCR
Description: GLM-OCR: Accurate × Fast × Comprehensive
Language: Python
License: Apache-2.0
Stars: 6935
Forks: 638
Open issues: 40
Created: 2026-02-02T12:59:43Z
Pushed: 2026-04-21T08:52:11Z
Default branch: main
Fork: no
Archived: no
README:
GLM-OCR
[中文阅读](README_zh.md)
👋 Join our WeChat and Discord community
📖 Check out the GLM-OCR technical report
📍 Use GLM-OCR's API
Model Introduction
GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture. It introduces Multi-Token Prediction (MTP) loss and stable full-task reinforcement learning to improve training efficiency, recognition accuracy, and generalization. The model integrates the CogViT visual encoder pre-trained on large-scale image–text data, a lightweight cross-modal connector with efficient token downsampling, and a GLM-0.5B language decoder. Combined with a two-stage pipeline of layout analysis and parallel recognition based on PP-DocLayout-V3, GLM-OCR delivers robust and high-quality OCR performance across diverse document layouts.
Key Features
- State-of-the-Art Performance: Achieves a score of 94.62 on OmniDocBench V1.5, ranking #1 overall, and delivers state-of-the-art results across major document understanding benchmarks, including formula recognition, table recognition, and information extraction.
- Optimized for Real-World Scenarios: Designed and optimized for practical business use cases, maintaining robust performance on complex tables, code-heavy documents, seals, and other challenging real-world layouts.
- Efficient Inference: With only 0.9B parameters, GLM-OCR supports deployment via vLLM, SGLang, and Ollama, significantly reducing inference latency and compute cost, making it ideal for high-concurrency services and edge deployments.
- Easy to Use: Fully open-sourced and equipped with a comprehensive SDK and inference toolchain, offering simple installation, one-line invocation, and smooth integration into existing production pipelines.
News & Updates
- [2026.3.12] GLM-OCR SDK now supports agent-friendly Skill mode — just
pip install glmocr+ set API key, ready to use via CLI or Python with no GPU or YAML config needed. See: [GLM-OCR Skill](skills/glmocr/SKILL.md) - [2026.3.12] GLM-OCR Technical Report is now available. See: GLM-OCR Technical Report
- [2026.2.12] Fine-tuning tutorial based on LLaMA-Factory is now available. See: [GLM-OCR Fine-tuning Guide](examples/finetune/README.md)
Download Model
| Model | Download Links | Precision | | ------- | --------------------------------------------------------------------------------------------------------------------------- | --------- | | GLM-OCR | 🤗 Hugging Face 🤖 ModelScope | BF16 |
GLM-OCR SDK
We provide an SDK for using GLM-OCR more efficiently and conveniently.
Install SDK
Choose the lightest installation that matches your scenario:
# Cloud / MaaS + local images / PDFs (fastest install) pip install glmocr # Self-hosted pipeline (layout detection) pip install "glmocr[selfhosted]" # Flask service support pip install "glmocr[server]"
Install from source for development:
# Install from source git clone https://github.com/zai-org/glm-ocr.git cd glm-ocr uv venv --python 3.12 --seed && source .venv/bin/activate uv pip install -e .
Model Deployment
Two ways to use GLM-OCR:
Option 1: Zhipu MaaS API (Recommended for Quick Start)
Use the hosted cloud API – no GPU needed. The cloud service runs the complete GLM-OCR pipeline internally, so the SDK simply forwards your request and returns the result.
1. Get an API key from https://open.bigmodel.cn 2. Configure config.yaml:
pipeline: maas: enabled: true # Enable MaaS mode api_key: your-api-key # Required
That's it! When maas.enabled=true, the SDK acts as a thin wrapper that:
- Forwards your documents to the Zhipu cloud API
- Returns the results directly (Markdown + JSON layout details)
- No local processing, no GPU required
Input note (MaaS): the upstream API accepts file as a URL or a data:;base64,... data URI. If you have raw base64 without the data: prefix, wrap it as a data URI (recommended). The SDK will auto-wrap local file paths / bytes / raw base64 into a data URI when calling MaaS.
API documentation: https://docs.bigmodel.cn/cn/guide/models/vlm/glm-ocr
Option 2: Self-host with vLLM / SGLang
Deploy the GLM-OCR model locally for full control. The SDK provides the complete pipeline: layout detection, parallel region OCR, and result formatting.
Install the self-hosted extra first:
pip install "glmocr[selfhosted]"
##### Using vLLM
Install vLLM:
docker pull vllm/vllm-openai:v0.19.0-ubuntu2404
Or using with pip:
pip install -U "vllm>=0.19.0"
Launch the service:
pip install "transformers>=5.3.0"
vllm serve zai-org/GLM-OCR --port 8080 --speculative-config '{"method": "mtp", "num_speculative_tokens": 3}' --served-model-name glm-ocr>Note Add --max-model-len and --gpu-memory-utilization according to Your own machine to handle large image/pdf
##### Using SGLang
Install SGLang:
docker pull lmsysorg/sglang:v0.5.10
Or using with pip:
pip install "sglang>=0.5.10"
Launch the service:
SGLANG_ENABLE_SPEC_V2=1 sglang serve --model-path zai-org/GLM-OCR --port 8080 --speculative-algorithm NEXTN --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --served-model-name glm-ocr
>Note Add --context-len and --mem-fraction-static according to Your own machine to handle large image/pdf
Option 3: Ollama/MLX
For specialized deployment scenarios, see the detailed guides:
- [Apple Silicon with mlx-vlm](examples/mlx-deploy/README.md) - Optimized for Apple Silicon Macs
- [Ollama Deployment](examples/ollama-deploy/README.md) - Simple local deployment with Ollama
Option 4: SDK Server + Client (GPU-less Client)
Deploy the SDK Server on a GPU machine, then use any machine as a client — no GPU needed on the client side. The client connects via the MaaS-compatible protocol, pointing api_url at your self-hosted server.
# Client config.yaml pipeline: maas: enabled: true…
Excerpt shown — open the source for the full document.
Notability
notability 8.0/10High traction notable OCR model release