Auto AI Router
High-performance proxy router for LLM APIs with automatic load balancing, rate limiting, and fail2ban protection.
Overview
Auto AI Router acts as a unified gateway between your applications and multiple LLM providers. It handles credential rotation, rate limit enforcement, and automatic failover — so your applications only need a single endpoint.
graph LR
Client["<b>Client</b><br/>(OpenAI format)"]
Router{{"<b>Auto AI Router</b><br/>- Load balancing<br/>- Rate limiting<br/>- Fail2ban"}}
subgraph Providers ["Backend Providers"]
OpenAI[OpenAI]
Vertex[Vertex AI]
Anthropic[Anthropic]
Gemini[Gemini]
Proxy[Proxy]
end
Client <--> Router
Router --> OpenAI
Router --> Vertex
Router --> Anthropic
Router --> Gemini
Router --> Proxy
Features
- Multi-provider routing — OpenAI, Vertex AI, Gemini AI Studio, Anthropic
- Proxy chains — forward to other Auto AI Router instances as fallback
- Round-robin balancing — distribute load across multiple credentials
- Two-level rate limiting — per-credential RPM/TPM + per-model limits
- Fail2ban protection — auto-ban credentials on repeated errors
- Prometheus metrics — request counts, latency, credential utilization
- Health dashboard — JSON API and HTML dashboard at
/healthand/vhealth - LiteLLM DB integration — spend logging, daily aggregation, API key auth
- SSE streaming — full streaming support for all providers
- Secure config — environment variable resolution via
os.environ/VAR_NAME
Getting Started
- Installation — build from source or use Docker
- Configuration — set up providers and credentials
- API Usage — make your first request