WritingCoreWeaveCoreWeavepublished Apr 20, 2026seen 6d

CoreWeave Is Now the Fastest at Inference on One of the Best Open Source Models Kimi K2.6

Open original ↗

Captured source

source ↗

CoreWeave Leads Artificial Analysis Kimi K2.6 Benchmark | CoreWeave Blog

Announcement

Announcement

Webinar

Announcement

Podcast

Announcement

GTC 2026

Announcement

CoreWeave brings up the industry’s first NVIDIA Vera Rubin NVL72 deployment.

Read more

Products

Data and storage

Infrastructure control

Runtime acceleration

Model and agent development

Mission control

Solutions

Pricing

Resources

About us

Contact us Login

Contact us Login

Clear

Output Speed

Output tokens per second · Higher is better · 10,000 Input Tokens

Accurate as of 5/11/2026

205

158

125

95

80

78

62

48

44

38

22

CoreWeave

Clarifai

Azure

Cloudflare

Fireworks

SiliconFlow (FP8)

Novita

Kimi

Together.ai (FP4)

DeepInfra (FP4)

Parasail

‍ Among providers serving the flagship open-source model from Moonshot AI, CoreWeave delivers the highest token throughput with lowest latency delivering best price performance 1 . In the rest of this article, we explain what that means for your workloads and take a technical look at the optimizations behind CoreWeave Inference. Kimi K2.6 is one of the best open-source models Kimi K2.6 is Moonshot AI’s flagship open-source model, released on April 20, 2026. It is one of the most widely used open-weight models, ranking #2 on OpenRouter’s leaderboards . It is a 1-trillion-parameter Mixture-of-Experts (MoE) model with 32 billion active parameters per token, native multimodal input, and a 262K context window. K2.6 is competitive with leading proprietary frontier models, scoring 86.3 on BrowseComp, 80.2 on SWE-Bench Verified, and 66.7 on Terminal-Bench 2.0. It has quickly become a favored model for coding agents and multi-step agent frameworks. CoreWeave leads in Artificial Analysis benchmarks for Kimi K2.6 Artificial Analysis evaluates the two metrics that matter most to production teams: output speed in tokens per second, and price per million tokens (a 7:2:1 blend of cache-hit, input, and output costs). The "most attractive" quadrant is high speed, low price. CoreWeave is the leading inference provider for Kimi K2.6 in the most attractive quadrant today.

Speed vs Price for Kimi K2.6 (source: Artificial Analysis ). Artificial Analysis describes the Speed vs. Price view for Kimi K2.6 as a blended 7:2:1 cache-hit, input, and output token price comparison against output speed across 11 providers. This is an independent third-party measurement. The winning optimization: training and deploying NVFP4 and EAGLE3 on NVIDIA GB300 The CoreWeave Applied Training team, an internal research group of domain experts, leverages our NVIDIA GB300 and GB200 NVL72 clusters alongside advanced optimization techniques, including NVIDIA Model Optimizer PTQ, to train and fine-tune the latest AI models. Every experiment is supported by rigorous quality and accuracy evaluations using Weights & Biases and modern test harness frameworks. Training a custom NVFP4 quantized model in addition to EAGLE3 speculative decoding led to the greatest performance gains on Kimi K2.6. We also validated the quality of the model across a range of notable benchmarks including Terminal-Bench 2.0, SWE-Bench Verified, SWE-Bench Pro, GPQA Diamond, AIME 2026, and LiveCodeBench v6. Validating the NVFP4 weights across these benchmarks is essential in improving model performance without degrading quality. Beyond model-level optimization, CoreWeave integrates performance enhancements throughout every layer of the CoreWeave Inference stack. From bare-metal access to the latest NVIDIA accelerators and high-speed interconnects to optimized memory architectures and custom inference stack tuning, CoreWeave is engineered to deliver industry-leading inference performance. Why price-performance matters for inference as enterprises scale AI In production inference, every lever that improves performance also impacts cost: GPU selection, quantization, attention backend, KV cache layout, batch sizing, speculative decoding and kernels. Real-time customer facing workloads demand low latencies, while offline workloads such as evaluations demand high throughput at low cost. CoreWeave Inference allows customers to tune cost-performance for each workload, with flexible options from token-based consumption to GPU-based scaling. Serverless Inference provides the fastest path to deployment, enabling developers to access a broad selection of models through a single unified API. Dedicated Inference offers greater flexibility by allowing customers to optimize performance with explicit GPU selection, runtime configuration, and auto-scaling, while CoreWeave manages the underlying infrastructure. For maximum control, Inference on CoreWeave Kubernetes Service (CKS ) gives customers direct control over GPUs, runtimes, orchestration, and capacity management. Try Kimi K2.6 on CoreWeave Inference now Learn more about CoreWeave Inference ‍ ‍

1 Price performance is measured in speed vs. price by Artificial Analysis

CoreWeave Inference leads Artificial Analysis benchmarks for Kimi K2.6 output speed and ranks in the most attractive price-performance quadrant.

Share this article: Copied

Related Blogs

The Data Center Questions Everyone Is Asking, Answered 5 min read

What a Reference Architecture for Distributed AI Training Actually Looks Like 6 min read

Why Inference Latency and Availability Drift in Production 7 min read

5 Misunderstandings About Enterprise AI Training Infrastructure 5 min read

Choosing the Right NVIDIA Platform for Running Inference on CoreWeave 5 min read

CoreWeave Closes the Loop Between Training and Inference 4 min read

Why Distributed Training Fails at Scale 7 min read

Run Agentic Workloads Safely at Scale with CoreWeave Sandboxes 6 min read

Red Hat AI Inference on CKS for Hybrid Inference 4 min read

Liquid Cooling for AI Data Centers: Run Cold, Act Bold 5 min read

Contact us Login

Products GPU Compute CPU Compute Storage Services Networking Services Managed Services Bare Metal Servers Platform Fleet LifeCycle Controller

Node LifeCycle Controller Tensorizer Observability

Solutions AI Model Training AI Inference VFX & Rendering Mission Control

AI Infrastructure

Why CoreWeave

Resources Customer Stories Documentation Status Pricing Resource Center Events & Webinars

About About Us Careers Life at CoreWeave

Newsroom Investor Relations Supplier Code of Conduct Terms of Service Do Not Sell or Share My Personal Information

© CoreWeave. 290 W Mt Pleasant Ave Suite 4100 Livingston, NJ…

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

Company inference benchmark claim