CoreWeave Is Now the Fastest at Inference on One of the Best Open Source Models Kimi K2.6
Captured source
source ↗CoreWeave Leads Artificial Analysis Kimi K2.6 Benchmark | CoreWeave Blog
Announcement
Announcement
Webinar
Announcement
Podcast
Announcement
GTC 2026
Announcement
CoreWeave brings up the industry’s first NVIDIA Vera Rubin NVL72 deployment.
Read more
Products
Data and storage
Infrastructure control
Runtime acceleration
Model and agent development
Mission control
Solutions
Pricing
Resources
About us
Contact us Login
Contact us Login
Clear
Output Speed
Output tokens per second · Higher is better · 10,000 Input Tokens
Accurate as of 5/11/2026
205
158
125
95
80
78
62
48
44
38
22
CoreWeave
Clarifai
Azure
Cloudflare
Fireworks
SiliconFlow (FP8)
Novita
Kimi
Together.ai (FP4)
DeepInfra (FP4)
Parasail
Among providers serving the flagship open-source model from Moonshot AI, CoreWeave delivers the highest token throughput with lowest latency delivering best price performance 1 . In the rest of this article, we explain what that means for your workloads and take a technical look at the optimizations behind CoreWeave Inference. Kimi K2.6 is one of the best open-source models Kimi K2.6 is Moonshot AI’s flagship open-source model, released on April 20, 2026. It is one of the most widely used open-weight models, ranking #2 on OpenRouter’s leaderboards . It is a 1-trillion-parameter Mixture-of-Experts (MoE) model with 32 billion active parameters per token, native multimodal input, and a 262K context window. K2.6 is competitive with leading proprietary frontier models, scoring 86.3 on BrowseComp, 80.2 on SWE-Bench Verified, and 66.7 on Terminal-Bench 2.0. It has quickly become a favored model for coding agents and multi-step agent frameworks. CoreWeave leads in Artificial Analysis benchmarks for Kimi K2.6 Artificial Analysis evaluates the two metrics that matter most to production teams: output speed in tokens per second, and price per million tokens (a 7:2:1 blend of cache-hit, input, and output costs). The "most attractive" quadrant is high speed, low price. CoreWeave is the leading inference provider for Kimi K2.6 in the most attractive quadrant today.
Speed vs Price for Kimi K2.6 (source: Artificial Analysis ). Artificial Analysis describes the Speed vs. Price view for Kimi K2.6 as a blended 7:2:1 cache-hit, input, and output token price comparison against output speed across 11 providers. This is an independent third-party measurement. The winning optimization: training and deploying NVFP4 and EAGLE3 on NVIDIA GB300 The CoreWeave Applied Training team, an internal research group of domain experts, leverages our NVIDIA GB300 and GB200 NVL72 clusters alongside advanced optimization techniques, including NVIDIA Model Optimizer PTQ, to train and fine-tune the latest AI models. Every experiment is supported by rigorous quality and accuracy evaluations using Weights & Biases and modern test harness frameworks. Training a custom NVFP4 quantized model in addition to EAGLE3 speculative decoding led to the greatest performance gains on Kimi K2.6. We also validated the quality of the model across a range of notable benchmarks including Terminal-Bench 2.0, SWE-Bench Verified, SWE-Bench Pro, GPQA Diamond, AIME 2026, and LiveCodeBench v6. Validating the NVFP4 weights across these benchmarks is essential in improving model performance without degrading quality. Beyond model-level optimization, CoreWeave integrates performance enhancements throughout every layer of the CoreWeave Inference stack. From bare-metal access to the latest NVIDIA accelerators and high-speed interconnects to optimized memory architectures and custom inference stack tuning, CoreWeave is engineered to deliver industry-leading inference performance. Why price-performance matters for inference as enterprises scale AI In production inference, every lever that improves performance also impacts cost: GPU selection, quantization, attention backend, KV cache layout, batch sizing, speculative decoding and kernels. Real-time customer facing workloads demand low latencies, while offline workloads such as evaluations demand high throughput at low cost. CoreWeave Inference allows customers to tune cost-performance for each workload, with flexible options from token-based consumption to GPU-based scaling. Serverless Inference provides the fastest path to deployment, enabling developers to access a broad selection of models through a single unified API. Dedicated Inference offers greater flexibility by allowing customers to optimize performance with explicit GPU selection, runtime configuration, and auto-scaling, while CoreWeave manages the underlying infrastructure. For maximum control, Inference on CoreWeave Kubernetes Service (CKS ) gives customers direct control over GPUs, runtimes, orchestration, and capacity management. Try Kimi K2.6 on CoreWeave Inference now Learn more about CoreWeave Inference
1 Price performance is measured in speed vs. price by Artificial Analysis
CoreWeave Inference leads Artificial Analysis benchmarks for Kimi K2.6 output speed and ranks in the most attractive price-performance quadrant.
Share this article: Copied
Related Blogs
The Data Center Questions Everyone Is Asking, Answered 5 min read
What a Reference Architecture for Distributed AI Training Actually Looks Like 6 min read
Why Inference Latency and Availability Drift in Production 7 min read
5 Misunderstandings About Enterprise AI Training Infrastructure 5 min read
Choosing the Right NVIDIA Platform for Running Inference on CoreWeave 5 min read
CoreWeave Closes the Loop Between Training and Inference 4 min read
Why Distributed Training Fails at Scale 7 min read
Run Agentic Workloads Safely at Scale with CoreWeave Sandboxes 6 min read
Red Hat AI Inference on CKS for Hybrid Inference 4 min read
Liquid Cooling for AI Data Centers: Run Cold, Act Bold 5 min read
Contact us Login
Products GPU Compute CPU Compute Storage Services Networking Services Managed Services Bare Metal Servers Platform Fleet LifeCycle Controller
Node LifeCycle Controller Tensorizer Observability
Solutions AI Model Training AI Inference VFX & Rendering Mission Control
AI Infrastructure
Why CoreWeave
Resources Customer Stories Documentation Status Pricing Resource Center Events & Webinars
About About Us Careers Life at CoreWeave
Newsroom Investor Relations Supplier Code of Conduct Terms of Service Do Not Sell or Share My Personal Information
© CoreWeave. 290 W Mt Pleasant Ave Suite 4100 Livingston, NJ…
Excerpt shown — open the source for the full document.
Notability
notability 5.0/10Company inference benchmark claim