ForkNovita AINovita AIpublished Dec 29, 2025seen 5d

novitalabs/aigw

forked from aigw-project/aigw

Open original ↗

Captured source

source ↗
published Dec 29, 2025seen 5dcaptured 14hhttp 200method plain

novitalabs/aigw

Description: The Intelligent Inference Scheduler for Large-scale Inference Services.

License: Apache-2.0

Stars: 0

Forks: 0

Open issues: 1

Created: 2025-12-29T05:42:30Z

Pushed: 2025-12-28T01:55:45Z

Default branch: main

Fork: yes

Parent repository: aigw-project/aigw

Archived: no

README:

The Intelligent Inference Scheduler for Large-scale Inference Services

About

AIGW is an intelligent inference scheduler for large-scale inference services. It provides intelligent routing, overload protection, and multi-tenant QoS capabilities through a global routing solution that is aware of load, KVCache, and Lora. This helps achieve higher throughput, lower latency, and efficient use of resources.

Status

Early & quick developing

Architecture

[![Architecture](docs/images/architecture.png)](docs/images/architecture.png)

Highlights

1. A flexible, powerful, and easy-to-maintain Envoy Golang extension 2. Near real-time load metric collection 3. A balanced multi-factor composite decision-making algorithm 4. A highly available architecture that supports horizontal scaling

Developer Guide

[Developer Guide](docs/en/developer-guide.md)

Community

AIGW is built based on Envoy and Istio. We express our sincere gratitude to them.

Roadmap

1. Precise cache-awareness 2. SLO-aware algorithm based on latency prediction 3. PD separation scheduling 4. DP level scheduling

License

This project is licensed under the [Apache 2.0](LICENSE) License.

Notability

notability 1.0/10

routine fork, no notable activity