ForkCoreWeaveCoreWeavepublished Jan 4, 2024seen 6d

coreweave/vllm

forked from vllm-project/vllm

Open original ↗

Captured source

source ↗
published Jan 4, 2024seen 6dcaptured 8hhttp 200method plain

coreweave/vllm

Description: A high-throughput and memory-efficient inference and serving engine for LLMs

Language: Python

License: Apache-2.0

Stars: 0

Forks: 1

Open issues: 1

Created: 2024-01-04T16:53:47Z

Pushed: 2025-08-12T16:09:51Z

Default branch: main

Fork: yes

Parent repository: vllm-project/vllm

Archived: no

README:

Easy, fast, and cheap LLM serving for everyone

| Documentation | Blog | Paper | Twitter/X | User Forum | Developer Slack |

---

*Latest News* 🔥

  • [2025/05] We hosted NYC vLLM Meetup! Please find the meetup slides here.
  • [2025/05] vLLM is now a hosted project under PyTorch Foundation! Please find the announcement here.
  • [2025/04] We hosted Asia Developer Day! Please find the meetup slides from the vLLM team here.
  • [2025/01] We are excited to announce the alpha release of vLLM V1: A major architectural upgrade with 1.7x speedup! Clean code, optimized execution loop, zero-overhead prefix caching, enhanced multimodal support, and more. Please check out our blog post here.

Previous News

---

About

vLLM is a fast and easy-to-use library for LLM inference and serving.

Originally developed in the Sky Computing Lab at UC Berkeley, vLLM has evolved into a community-driven project with contributions from both academia and industry.

vLLM is fast with:

  • State-of-the-art serving throughput
  • Efficient management of attention key and value memory with…

Excerpt shown — open the source for the full document.