What does this model signal mean?

Moonshot AI (Kimi) published moonshotai/Kimi-K2-Instruct. This model signal is evidence of what shipped on model infrastructure and how the release is positioned. High-signal details: license other · 201.5K HF downloads · Moonshot AI's instruct-tuned large language model Kimi K2.. onlylabs links this event to 1 captured evidence page and 6 related model signals.

Moonshot AI (Kimi) Model: moonshotai/Kimi-K2-Instruct

Captured source

source ↗

Hugging Face/huggingface.co/moonshotai/Kimi-K2-Instruct

moonshotai/Kimi-K2-Instruct model card

Source ↗

published Jul 11, 2025seen Jun 6captured Jun 11http 200method plaintask text-generationlicense otherlibrary transformersparams 1026Bdownloads 202klikes 2.4k

📰 Tech Blog | 📄 Paper

0. Changelog

2025.8.11

Messages with name field are now supported. We’ve also moved the chat template to a standalone file for easier viewing.

2025.7.18

We further modified our chat template to improve its robustness. The default system prompt has also been updated.

2025.7.15

We have updated our tokenizer implementation. Now special tokens like [EOS] can be encoded to their token ids.
We fixed a bug in the chat template that was breaking multi-turn tool calls.

1. Model Introduction

Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. Trained with the Muon optimizer, Kimi K2 achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities.

Key Features

Large-Scale Training: Pre-trained a 1T parameter MoE model on 15.5T tokens with zero training instability.
MuonClip Optimizer: We apply the Muon optimizer to an unprecedented scale, and develop novel optimization techniques to resolve instabilities while scaling up.
Agentic Intelligence: Specifically designed for tool use, reasoning, and autonomous problem-solving.

Model Variants

Kimi-K2-Base: The foundation model, a strong start for researchers and builders who want full control for fine-tuning and custom solutions.
Kimi-K2-Instruct: The post-trained model best for drop-in, general-purpose chat and agentic experiences. It is a reflex-grade model without long thinking.

2. Model Summary

3. Evaluation Results

Instruction model evaluation results

• Bold denotes global SOTA, and underlined denotes open-source SOTA.

• Data points marked with * are taken directly from the model's tech report or blog.

• All metrics, except for SWE-bench Verified (Agentless), are evaluated with an 8k output token length. SWE-bench Verified (Agentless) is limited to a 16k output token length.

• Kimi K2 achieves 65.8% pass@1 on the SWE-bench Verified tests with bash/editor tools (single-attempt patches, no test-time compute). It also achieves a 47.3% pass@1 on the SWE-bench Multilingual tests under the same conditions. Additionally, we report results on SWE-bench Verified tests (71.6%) that leverage parallel test-time compute by sampling multiple sequences and selecting the single best via an internal scoring model.

• To ensure the stability of the evaluation, we employed avg@k on the AIME, HMMT, CNMO, PolyMath-en, GPQA-Diamond, EvalPlus, Tau2.

• Some data points have been omitted due to prohibitively expensive evaluation costs.

---

Base model evaluation results

Benchmark Metric Shot Kimi K2 Base Deepseek-V3-Base Qwen2.5-72B Llama 4 Maverick

General Tasks

MMLU EM 5-shot 87.8 87.1 86.1 84.9

MMLU-pro EM 5-shot 69.2 60.6 62.8 63.5

MMLU-redux-2.0 EM 5-shot 90.2 89.5 87.8 88.2

SimpleQA Correct 5-shot 35.3 26.5 10.3 23.7

TriviaQA EM 5-shot 85.1 84.1 76.0 79.3

GPQA-Diamond Avg@8 5-shot 48.1 50.5 40.8 49.4

SuperGPQA EM 5-shot 44.7 39.2 34.2 38.8

Coding Tasks

LiveCodeBench v6 Pass@1 1-shot 26.3 22.9 21.1 25.1

EvalPlus Pass@1 - 80.3 65.6 66.0 65.5

Mathematics Tasks

MATH EM 4-shot 70.2 60.1 61.0 63.0

GSM8k EM 8-shot 92.1 91.7 90.4 86.3

<td align="

Excerpt shown — open the source for the full document.

Notability

notability 9.0/10

High traction, notable model from Moonshot AI