What does this model signal mean?

LG AI Research (EXAONE) published LGAI-EXAONE/K-EXAONE-236B-A23B. This model signal is evidence of what shipped on model infrastructure and how the release is positioned. High-signal details: license other · 46.5K HF downloads · 236B-parameter MoE model from LG AI Research, 23B active.. onlylabs links this event to 1 captured evidence page and 6 related model signals.

LG AI Research (EXAONE) Model: LGAI-EXAONE/K-EXAONE-236B-A23B

Captured source

source ↗

Hugging Face/huggingface.co/LGAI-EXAONE/K-EXAONE-236B-A23B

LGAI-EXAONE/K-EXAONE-236B-A23B model card

Source ↗

published Dec 26, 2025seen Jun 6captured Jun 11http 200method plaintask text-generationlicense otherlibrary transformersparams 237Bdownloads 47klikes 570

K-EXAONE-236B-A23B

Introduction

We introduce K-EXAONE, a large-scale multilingual language model developed by LG AI Research. Built using a Mixture-of-Experts architecture, K-EXAONE features 236 billion total parameters, with 23 billion active during inference. Performance evaluations across various benchmarks demonstrate that K-EXAONE excels in reasoning, agentic capabilities, general knowledge, multilingual understanding, and long-context processing.

Key Features

Architecture & Efficiency: Features a 236B fine-grained MoE design (23B active) optimized with Multi-Token Prediction (MTP), enabling self-speculative decoding that boosts inference throughput by approximately 1.5x.
Long-Context Capabilities: Natively supports a 256K context window, utilizing a 3:1 hybrid attention scheme with a 128-token sliding window to significantly minimize memory usage during long-document processing.
Multilingual Support: Covers 6 languages: Korean, English, Spanish, German, Japanese, and Vietnamese. Features a redesigned 150k vocabulary with SuperBPE, improving token efficiency by ~30%.
Agentic Capabilities: Demonstrates superior tool-use and search capabilities via multi-agent strategies.
Safety & Ethics: Aligned with universal human values, the model uniquely incorporates Korean cultural and historical contexts to address regional sensitivities often overlooked by other models. It demonstrates high reliability across diverse risk categories.

For more details, please refer to the technical report, blog and GitHub.

![main_figure](assets/main_figure.png)

Model Configuration

Number of Parameters: 236B in total and 23B activated
Number of Parameters (without embeddings): 234B
Hidden Dimension: 6,144
Number of Layers: 48 Main layers + 1 MTP layers
Hybrid Attention Pattern: 12 x (3 Sliding window attention + 1 Global attention)
Sliding Window Attention
Number of Attention Heads: 64 Q-heads and 8 KV-heads
Head Dimension: 128 for both Q/KV
Sliding Window Size: 128
Global Attention
Number of Attention Heads: 64 Q-heads and 8 KV-heads
Head Dimension: 128 for both Q/KV
No Rotary Positional Embedding Used (NoPE)
Mixture of Experts:
Number of Experts: 128
Number of Activated Experts: 8
Number of Shared Experts: 1
MoE Intermediate Size: 2,048
Vocab Size: 153,600
Context Length: 262,144 tokens
Knowledge Cutoff: Dec 2024 (2024/12)

Evaluation Results

The following table shows the evaluation results of the K-EXAONE model in reasoning mode, compared to our previous model, EXAONE-4.0, and other competing models. The evaluation details can be found in the technical report.

K-EXAONE (Reasoning) EXAONE 4.0 (Reasoning) GPT-OSS (Reasoning: High) Qwen3-Thinking-2507 DeepSeek-V3.2 (Reasoning)

Architecture MoE Dense MoE MoE MoE

Total Params 236B 32B 117B 235B 671B

Active Params 23B 32B 5.1B 22B 37B

World Knowledge

MMLU-Pro 83.8 81.8 80.7 84.4 85.0

GPQA-Diamond 79.1 75.4 80.1 81.1 82.4

Humanity's Last Exam 13.6 10.6 14.9 18.2 25.1

Math

IMO-AnswerBench 76.3 66.1 75.6 74.8 78.3

AIME 2025 92.8 85.3 92.5 92.3 93.1

HMMT Nov 2025 86.8 78.1 84.9 88.8 90.2

Coding / Agentic Coding

LiveCodeBench Pro 25Q2 (Medium) 25.9 4.8 35.4 16.0 27.9

LiveCodeBench v6 80.7 66.7 81.9 74.1 79.4

Terminal-Bench 2.0 29.0 - 18.7 13.3 46.4

SWE-Bench Verified 49.4 - 62.4 25.0 73.1

Agentic Tool Use

τ2-Bench (Retail) 78.6 67.5 69.1 71.9 77.9

τ2-Bench (Airline) 60.4 52.0 60.5 58.0 66.0

τ2-Bench (Telecom) 73.5 23.7 60.3 45.6 85.8

BrowseComp 31.4 - - - 51.4

Instruction Following

IFBench 67.3 36.0 69.5 52.6 62.5

IFEval 89.7 84.7 89.5 87.8 92.6

Long Context Understanding

AA-LCR 53.5 14.0 50.7 67.0 65.0

OpenAI-MRCR 52.3 20.1 29.9 58.6 57.7

Korean

KMMLU-Pro 67.3 67.7 62.4 71.6 72.1

KoBALT 61.8 25.4 54.3 56.1 62.7

CLIcK 83.9 78.8 74.6 81.3 86.3

HRM8K 90.9 89.4 91.6 92.0 90.6

Ko-LongBench 86.8 68.0 82.2 83.2 87.9

Multilinguality

MMMLU 85.7 83.2 83.8 87.3 88.0

WMT24++ 90.5 80.8 93.6 94.7 90.0

Safety

Wild-Jailbreak 89.9 62.8 98.2 85.5 79.1

KGC-Safety 96.1 58.0 92.5 66.2 73.0

Requirements

K-EXAONE is supported by multiple libraries. Please install the required libraries as needed for your use case.

Transformers

You should install transformers >= 5.1.0 for the K-EXAONE model.

vLLM

To serve the K-EXAONE model on a vLLM server, you should install both Transformers and vLLM (vllm >= 0.14.0).

SGLang

You should install both Transformers and SGLang to serve the K-EXAONE model on SGLang server. You can install the latest version of SGLang from source using the following commands.

git clone https://github.com/sgl-project/sglang.git
pip install -e sglang/python

llama.cpp

To use the K-EXAONE model with llama.cpp library, you should install llama.cpp >= b7737.

Quickstart

You can use the K-EXAONE model with the Transformers library version 5.1.0 or later.

Reasoning mode

For tasks that require accurate results, you can run the K-EXAONE model in reasoning mode as below.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "LGAI-EXAONE/K-EXAONE-236B-A23B"

model = AutoModelForCausalLM.from_pretrained(
model_name,
dtype="bfloat16",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

messages = [
{"role": "system", "content": "You are K-EXAONE, a large language model developed by LG AI Research in South Korea, built to serve as a helpful and reliable assistant."},
{"role": "user", "content": "Which one is bigger, 3.9 vs 3.12?"}
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
enable_thinking=True, # skippable (default: True)
)

generated_ids = model.generate(
**input_ids.to(model.device),
max_new_tokens=16384,
temperature=1.0,
top_p=0.95,
do_sample=True,
)
output_ids = generated_ids[0][input_ids['input_ids'].shape[-1]:]
print(tokenizer.decode(output_ids, skip_special_tokens=True))

Non-reasoning mode

For tasks where latency matters more than accuracy, you can run the K-EXAONE model in non-reasoning mode as below.

messages = [
{"role": "system",...

Excerpt shown — open the source for the full document.

Notability

notability 7.0/10

Notable MoE model release, decent downloads