What does this model signal mean?

Moonshot AI (Kimi) published moonshotai/Kimi-K2-Base. This model signal is evidence of what shipped on model infrastructure and how the release is positioned. High-signal details: license other · 8.7K HF downloads · Moonshot AI's base language model from the Kimi-K2 series.. onlylabs links this event to 1 captured evidence page and 6 related model signals.

Moonshot AI (Kimi) Model: moonshotai/Kimi-K2-Base

Captured source

source ↗

Hugging Face/huggingface.co/moonshotai/Kimi-K2-Base

moonshotai/Kimi-K2-Base model card

Source ↗

published Jul 3, 2025seen Jun 6captured Jun 11http 200method plaintask text-generationlicense otherlibrary transformersparams 1026Bdownloads 8.7klikes 304

📰 Tech Blog | 📄 Paper Link (comming soon)

1. Model Introduction

Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. Trained with the Muon optimizer, Kimi K2 achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities.

Key Features

Large-Scale Training: Pre-trained a 1T parameter MoE model on 15.5T tokens with zero training instability.
MuonClip Optimizer: We apply the Muon optimizer to an unprecedented scale, and develop novel optimization techniques to resolve instabilities while scaling up.
Agentic Intelligence: Specifically designed for tool use, reasoning, and autonomous problem-solving.

Model Variants

Kimi-K2-Base: The foundation model, a strong start for researchers and builders who want full control for fine-tuning and custom solutions.
Kimi-K2-Instruct: The post-trained model best for drop-in, general-purpose chat and agentic experiences. It is a reflex-grade model without long thinking.

2. Model Summary

3. Evaluation Results

Instruction model evaluation results

• Bold denotes global SOTA, and underlined denotes open-source SOTA.

• Data points marked with * are taken directly from the model's tech report or blog.

• All metrics, except for SWE-bench Verified (Agentless), are evaluated with an 8k output token length. SWE-bench Verified (Agentless) is limited to a 16k output token length.

• Kimi K2 achieves 65.8% pass@1 on the SWE-bench Verified tests with bash/editor tools (single-attempt patches, no test-time compute). It also achieves a 47.3% pass@1 on the SWE-bench Multilingual tests under the same conditions. Additionally, we report results on SWE-bench Verified tests (71.6%) that leverage parallel test-time compute by sampling multiple sequences and selecting the single best via an internal scoring model.

• To ensure the stability of the evaluation, we employed avg@k on the AIME, HMMT, CNMO, PolyMath-en, GPQA-Diamond, EvalPlus, Tau2.

• Some data points have been omitted due to prohibitively expensive evaluation costs.

---

Base model evaluation results

• We only evaluate open-source

Excerpt shown — open the source for the full document.

Notability

notability 7.0/10

New base model with solid traction.