novitalabs/aigw
forked from aigw-project/aigw
Captured source
source ↗novitalabs/aigw
Description: The Intelligent Inference Scheduler for Large-scale Inference Services.
License: Apache-2.0
Stars: 0
Forks: 0
Open issues: 1
Created: 2025-12-29T05:42:30Z
Pushed: 2025-12-28T01:55:45Z
Default branch: main
Fork: yes
Parent repository: aigw-project/aigw
Archived: no
README:
The Intelligent Inference Scheduler for Large-scale Inference Services
About
AIGW is an intelligent inference scheduler for large-scale inference services. It provides intelligent routing, overload protection, and multi-tenant QoS capabilities through a global routing solution that is aware of load, KVCache, and Lora. This helps achieve higher throughput, lower latency, and efficient use of resources.
Status
Early & quick developing
Architecture
[](docs/images/architecture.png)
Highlights
1. A flexible, powerful, and easy-to-maintain Envoy Golang extension 2. Near real-time load metric collection 3. A balanced multi-factor composite decision-making algorithm 4. A highly available architecture that supports horizontal scaling
Developer Guide
[Developer Guide](docs/en/developer-guide.md)
Community
AIGW is built based on Envoy and Istio. We express our sincere gratitude to them.
Roadmap
1. Precise cache-awareness 2. SLO-aware algorithm based on latency prediction 3. PD separation scheduling 4. DP level scheduling
License
This project is licensed under the [Apache 2.0](LICENSE) License.
Notability
notability 1.0/10routine fork, no notable activity