inclusionAI/LLaDA2.1-mini
Captured source
source ↗LLaDA2.1-mini
🚀 LLaDA2.1-flash is now live on ZenmuxAI! Try it via API 🛠️ or Chat 💬: https://zenmux.ai/inclusionai/llada2.1-flash
LLaDA2.1-mini is a diffusion language model of the LLaDA series featuring the editing enhancement. It significantly improves inference speed while delivering strong task performance.
---
Model Performance
Benchmark Qwen3-8B (no_think) (Score) Ling-mini-2.0
(Score) LLaDA2.0-mini
(Score | TPF) LLaDA2.1-mini (S Mode) (Score | TPF) LLaDA2.1-mini (Q Mode) (Score | TPF)
Average 61.59 64.72 63.39 | 2.60 62.07 | 5.34 63.90 | 3.12
Knowledge
GPQA 48.01 59.41 47.76 | 2.73 48.36 | 3.62 53.28 | 2.12
MMLU-Pro 65.83 67.18 64.27 | 2.15 63.42 | 4.22 64.84 | 2.41
C-EVAL 80.6 82.17 81.80 | 1.78 78.40 | 3.39 78.59 | 1.91
PHYBench 9.76 14.59 11.70 | 2.48 12.75 | 4.41 13.05 | 2.52
TriviaQA 52.51 55.63 51.33 | 1.54 53.33 | 3.21 54.24 | 2.02
Reasoning
BIG-Bench Hard 79.48 83.70 78.21 | 2.36 78.42 | 5.02 80.58 | 2.86
BIG-Bench Extra Hard 18.27 14.81 16.47 | 2.03 15.30 | 3.19 15.78 | 1.66
bbh-zh 80.09 66.11 75.75 | 2.77 67.65 | 3.89 70.40 | 2.35
MuSR 70.02 71.36 71.48 | 1.45 70.43 | 2.48 71.89 | 1.56
ZebraLogic 37.48 79.85 64.20 | 2.30 68.50 | 5.38 77.10 | 2.93
PrOntoQA 93.12 96.06 86.00 | 2.36 87.50 | 4.86 84.50 | 2.73
PIQA 88.30 87.54 86.51 | 1.45 84.87 | 2.59 86.89 | 1.45
OCNLI 61.49 60.17 64.51 | 4.06 61.02 | 1.78 61.59 | 1.23
HellaSwag 79.56 69.02 79.01 | 1.50 75.71 | 2.39 76.19 | 1.49
KOR-Bench 54.96 63.2 49.92 | 2.45 46.64 | 4.28 48.00 | 2.35
DROP 84.56 78.80 81.89 | 2.02 81.55 | 5.84 82.37 | 2.87
SQuAD 2.0 85.21 75.56 86.50 | 2.47 84.51 | 4.33 85.13 | 3.09
Coding
LiveCodeBench 26.76 42.29 31.83 | 3.34 28.85 | 6.42 30.40 | 3.63
CRUXEval-O 74.06 76.12 71.62 | 2.78 70.62 | 5.85 73.75 | 3.35
MBPP+ 72.69 77.25 78.24 | 3.43 73.28 | 10.59 74.07 | 6.30
HumanEval+ 79.5 80.03 81.40 | 5.16 80.49 | 12.32 82.93 | 7.77
MultiPL-E 61.70 67.09 67.46 | 2.78 64.16 | 7.23 67.17 | 4.01
BigCodeBench-Full 36.05 35.00 32.89 | 2.87 30.18 | 7.33 34.39 | 4.09
BIRD-SQL 36.11 39.67 39.34 | 1.96 37.32 | 4.48 38.40 | 2.42
Spider 72.80 76.43 76.76 | 3.93 75.78 | 7.98 77.55 | 5.48
Math
AIME 2025 22.08 47.66 36.67 | 2.41 36.67 | 6.34 43.33 | 3.29
OlympiadBench 55.33 72.30 67.70 | 2.63 64.30 | 7.08 66.67 | 3.99
GSM-Plus 85.56 87.18 86.50 | 2.41 85.88 | 6.82 86.55 | 3.69
CMATH 95.42 96.40 95.72 | 1.98 95.63 | 4.94 94.99 | 2.56
Omni-MATH 33.20 48.80 41.70 | 2.57 41.70 | 6.41 43.60 | 3.56
Agent & Alignment
IFEval-strict-prompt 84.29 76.16 80.78 | 1.24 81.33 | 1.83 83.18 | 1.25
BFCL v3 70.12 53.75 70.72 | 4.26 72.06 | 7.39 73.61 | 5.14
Nexus FC 37.71 34.38 35.18 | 4.06 31.59 | 8.27 33.69 | 4.91
---
🚀 Highlights
+ Error-Correcting Editable: Structural innovation of editable generation for dLLM + Speedy vs Quality Mode: The 16B mini model achieves ultra-fast inference under Speed Mode while remaining competitive across various tasks and under Quality Mode. + Reinforcement Learning on 100B-scale dLLM: Tailored algorithm and framework to enable reinforcement learning for large dLLM.
🗺️ What's Next
+ Powerful Agentic/Tool Use Capability with LLaDA: Next update will be equipped with powerful Agentic and long-distance tool-use capability. + Extreme Editing: Next update will feature stronger and more extensive editing capabilities, aimed at correcting more errors in parallel reasoning. + Explore More Training Paradigms: We want to explore more training paradigms than SFT and RL for dLLM.
---
📦 Model Variants
| Model ID | Description | Hugging Face Link | | --- | --- | --- | | inclusionAI/LLaDA2.1-mini | Instruction-tuned model, ready for downstream applications. | 🤗 Model Card | | inclusionAI/LLaDA2.1-flash | Instruction-tuned model, ready for downstream applications. | 🤗 Model Card |
---
🔍 Model Overview
LLaDA2.1-mini has the following specifications:
+ Type: Mixture-of-Experts (MoE) Diffusion Language Model + Total Parameters (Non-Embedding): 16B + Number of Layers: 20 + Attention Heads: 16 + Context Length: 32,768 tokens + Position Embedding: Rotary (RoPE) + Vocabulary Size: 157,184
---
🤗 Hugging Face Transformers
Make sure you have transformers and its dependencies installed:
import torch
import torch.nn.functional as F
from transformers import AutoModelForCausalLM, AutoTokenizer
model_path = "/path/to/LLaDA2.1-mini"
device = "auto"
model = AutoModelForCausalLM.from_pretrained(
model_path, trust_remote_code=True, device_map=device,
)
model = model.to(torch.bfloat16)
model.eval()
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
prompt = """Calculate 1+5-28*0.5-200=?"""
input_ids = tokenizer.apply_chat_template(
[{"role": "user", "content": prompt}],
add_generation_prompt=True,
tokenize=True,
return_tensors="pt",
)
generated_tokens = model.generate(
inputs=input_ids,
eos_early_stop=True,
gen_length=512,
block_length=32,
threshold=0.5,
editing_threshold=0,
temperature=0.0,
)
generated_answer = tokenizer.decode(
generated_tokens[0],
skip_special_tokens=True,
)
print(generated_answer)Best Practices
To achieve optimal performance, we recommend the following settings:
1. Sampling Parameters: We recommend the following general sampling parameters: block_length=32, temperature=0.0, top_p=None and top_k=None. We are currently exploring more diverse sampling configurations.
2. Denoising Thresholds: There are three denoising params: threshold, editing_threshold and max_post_steps. We recommend threshold=0.7, editing_threshold=0.5 for Quality Mode and threshold=0.5, editing_threshold=0.0 for Speed Mode. For both modes, we suggest setting max_post_steps to a value greater than 5. We recommend 16 as a balanced default, which was used for most of our internal testing.
Note: Low threshold may causes stuttering in trade-off for quick inference.
3. Adequate Output Length: We recommend using an output length of 16384 tokens for most scenarios.
---
🤖ModelScope
If you're in mainland China, we strongly recommend you to use our model from 🤖ModelScope
---
Deployment
SGLang
SGLang enables dLLM inference either through offline batching or by launching an HTTP server…
Excerpt shown — open the source for the full document.
Notability
notability 7.0/10Notable mini model with solid downloads