MiniMax-AI/vllm
forked from vllm-project/vllm
Captured source
source ↗MiniMax-AI/vllm
Description: A high-throughput and memory-efficient inference and serving engine for LLMs
Language: Python
License: Apache-2.0
Stars: 17
Forks: 11
Open issues: 0
Created: 2025-04-21T12:30:16Z
Pushed: 2025-10-26T16:08:09Z
Default branch: main
Fork: yes
Parent repository: vllm-project/vllm
Archived: no
README:
Easy, fast, and cheap LLM serving for everyone
| Documentation | Blog | Paper | Twitter/X | User Forum | Developer Slack |
--- Join us at the PyTorch Conference, October 22-23 and Ray Summit, November 3-5 in San Francisco for our latest updates on vLLM and to meet the vLLM team! Register now for the largest vLLM community events of the year!
---
*Latest News* 🔥
- [2025/09] We hosted vLLM Toronto Meetup focused on tackling inference at scale and speculative decoding with speakers from NVIDIA and Red Hat! Please find the meetup slides here.
- [2025/08] We hosted vLLM Shenzhen Meetup focusing on the ecosystem around vLLM! Please find the meetup slides here.
- [2025/08] We hosted vLLM Singapore Meetup. We shared V1 updates, disaggregated serving and MLLM speedups with speakers from Embedded LLM, AMD, WekaIO, and A*STAR. Please find the meetup slides here.
- [2025/08] We hosted vLLM Shanghai Meetup focusing on building, developing, and integrating with vLLM! Please find the meetup slides here.
- [2025/05] vLLM is now a hosted project under PyTorch Foundation! Please find the announcement here.
- [2025/01] We are excited to announce the alpha release of vLLM V1: A major architectural upgrade with 1.7x speedup! Clean code, optimized execution loop, zero-overhead prefix caching, enhanced multimodal support, and more. Please check out our blog post here.
Previous News
- [2025/08] We hosted vLLM Korea Meetup with Red Hat and Rebellions! We shared the latest advancements in vLLM along with project spotlights from the vLLM Korea community. Please find the meetup slides here.
- [2025/08] We hosted vLLM Beijing Meetup focusing on large-scale LLM deployment! Please find the meetup slides here and the recording here.
- [2025/05] We hosted NYC vLLM Meetup! Please find the meetup slides here.
- [2025/04] We hosted Asia Developer Day! Please find the meetup slides from the vLLM team here.
- [2025/03] We hosted vLLM x Ollama Inference Night! Please find the meetup slides from the vLLM team here.
- [2025/03] We hosted the first vLLM China Meetup! Please find the meetup slides from vLLM team here.
- [2025/03] We hosted the East Coast vLLM Meetup! Please find the meetup slides here.
- [2025/02] We hosted the ninth vLLM meetup with Meta! Please find the meetup slides from vLLM team here and AMD here. The slides from Meta will not be posted.
- [2025/01] We hosted the eighth vLLM meetup with Google Cloud! Please find the meetup slides from vLLM team here, and Google Cloud team here.
- [2024/12] vLLM joins pytorch ecosystem! Easy, Fast, and Cheap LLM Serving for Everyone!
- [2024/11] We hosted the seventh vLLM meetup with Snowflake! Please find the meetup slides from vLLM team here, and Snowflake team here.
- [2024/10] We have just created a developer slack (slack.vllm.ai) focusing on coordinating contributions and discussing features. Please feel free to join us there!
- [2024/10] Ray Summit 2024 held a special track for vLLM! Please find the opening talk slides from the vLLM team here. Learn more from the talks from other vLLM contributors and users!
- [2024/09] We hosted the sixth vLLM meetup with NVIDIA! Please find the meetup slides here.
- [2024/07] We hosted the fifth vLLM meetup with AWS! Please find the…
Excerpt shown — open the source for the full document.
Notability
notability 1.0/10Routine fork with low stars