coreweave/vllm
forked from vllm-project/vllm
Captured source
source ↗coreweave/vllm
Description: A high-throughput and memory-efficient inference and serving engine for LLMs
Language: Python
License: Apache-2.0
Stars: 0
Forks: 1
Open issues: 1
Created: 2024-01-04T16:53:47Z
Pushed: 2025-08-12T16:09:51Z
Default branch: main
Fork: yes
Parent repository: vllm-project/vllm
Archived: no
README:
Easy, fast, and cheap LLM serving for everyone
| Documentation | Blog | Paper | Twitter/X | User Forum | Developer Slack |
---
*Latest News* 🔥
- [2025/05] We hosted NYC vLLM Meetup! Please find the meetup slides here.
- [2025/05] vLLM is now a hosted project under PyTorch Foundation! Please find the announcement here.
- [2025/04] We hosted Asia Developer Day! Please find the meetup slides from the vLLM team here.
- [2025/01] We are excited to announce the alpha release of vLLM V1: A major architectural upgrade with 1.7x speedup! Clean code, optimized execution loop, zero-overhead prefix caching, enhanced multimodal support, and more. Please check out our blog post here.
Previous News
- [2025/03] We hosted vLLM x Ollama Inference Night! Please find the meetup slides from the vLLM team here.
- [2025/03] We hosted the first vLLM China Meetup! Please find the meetup slides from vLLM team here.
- [2025/03] We hosted the East Coast vLLM Meetup! Please find the meetup slides here.
- [2025/02] We hosted the ninth vLLM meetup with Meta! Please find the meetup slides from vLLM team here and AMD here. The slides from Meta will not be posted.
- [2025/01] We hosted the eighth vLLM meetup with Google Cloud! Please find the meetup slides from vLLM team here, and Google Cloud team here.
- [2024/12] vLLM joins pytorch ecosystem! Easy, Fast, and Cheap LLM Serving for Everyone!
- [2024/11] We hosted the seventh vLLM meetup with Snowflake! Please find the meetup slides from vLLM team here, and Snowflake team here.
- [2024/10] We have just created a developer slack (slack.vllm.ai) focusing on coordinating contributions and discussing features. Please feel free to join us there!
- [2024/10] Ray Summit 2024 held a special track for vLLM! Please find the opening talk slides from the vLLM team here. Learn more from the talks from other vLLM contributors and users!
- [2024/09] We hosted the sixth vLLM meetup with NVIDIA! Please find the meetup slides here.
- [2024/07] We hosted the fifth vLLM meetup with AWS! Please find the meetup slides here.
- [2024/07] In partnership with Meta, vLLM officially supports Llama 3.1 with FP8 quantization and pipeline parallelism! Please check out our blog post here.
- [2024/06] We hosted the fourth vLLM meetup with Cloudflare and BentoML! Please find the meetup slides here.
- [2024/04] We hosted the third vLLM meetup with Roblox! Please find the meetup slides here.
- [2024/01] We hosted the second vLLM meetup with IBM! Please find the meetup slides here.
- [2023/10] We hosted the first vLLM meetup with a16z! Please find the meetup slides here.
- [2023/08] We would like to express our sincere gratitude to Andreessen Horowitz (a16z) for providing a generous grant to support the open-source development and research of vLLM.
- [2023/06] We officially released vLLM! FastChat-vLLM integration has powered LMSYS Vicuna and Chatbot Arena since mid-April. Check out our blog post.
---
About
vLLM is a fast and easy-to-use library for LLM inference and serving.
Originally developed in the Sky Computing Lab at UC Berkeley, vLLM has evolved into a community-driven project with contributions from both academia and industry.
vLLM is fast with:
- State-of-the-art serving throughput
- Efficient management of attention key and value memory with…
Excerpt shown — open the source for the full document.