Configuration
Auto AI Router is configured via a YAML file passed with the -config flag.
See the full example in config.yaml.example.
Full Example
server:
port: 8080
max_body_size_mb: 100
response_body_multiplier: 10
request_timeout: 60s
write_timeout: 60s
idle_timeout: 2m
idle_conn_timeout: 120s
max_idle_conns: 200
max_idle_conns_per_host: 20
logging_level: info
master_key: "sk-your-master-key-here"
default_models_rpm: -1
model_prices_link: ""
fail2ban:
max_attempts: 3
ban_duration: permanent
error_codes: [401, 403, 429, 500, 502, 503, 504]
# error_code_rules:
# - code: 429
# max_attempts: 5
# ban_duration: 5m
monitoring:
prometheus_enabled: true
log_errors: false
errors_log_path: "logs/logs.jsonl"
credentials:
- name: "openai_main"
type: "openai"
api_key: "sk-proj-xxxxx"
base_url: "https://api.openai.com"
rpm: 100
tpm: 50000
- name: "vertex_ai"
type: "vertex-ai"
project_id: "your-project-id"
location: "global"
credentials_file: "path/to/service-account.json"
rpm: 100
tpm: 50000
- name: "gemini_studio"
type: "gemini"
api_key: "os.environ/GEMINI_API_KEY"
base_url: "https://generativelanguage.googleapis.com"
rpm: 60
tpm: -1
- name: "proxy_fallback"
type: "proxy"
base_url: "http://backup-router.local:8080"
api_key: "sk-remote-master-key"
rpm: 200
tpm: 100000
is_fallback: true
models:
- name: "gpt-4o"
credential: openai_main
rpm: 100
tpm: 50000
- name: "gemini-2.5-pro"
credential: vertex_ai
rpm: 100
tpm: 50000
litellm_db:
enabled: false
is_required: false
database_url: "os.environ/LITELLM_DATABASE_URL"
max_conns: 25
min_conns: 5
health_check_interval: 10s
connect_timeout: 5s
auth_cache_ttl: 20s
auth_cache_size: 10000
log_queue_size: 5000
log_batch_size: 100
log_flush_interval: 5s
log_retry_attempts: 3
log_retry_delay: 1s
Server Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
port |
int | 8080 | Listen port |
max_body_size_mb |
int | 100 | Maximum request body size (MB) |
response_body_multiplier |
int | 10 | Response body limit = max_body_size_mb * this value |
request_timeout |
duration | 60s | Request timeout |
write_timeout |
duration | 60s | HTTP server write timeout |
idle_timeout |
duration | 2m | HTTP server idle timeout (default: 2 * write_timeout) |
idle_conn_timeout |
duration | 120s | Idle connection timeout for keep-alive connections |
max_idle_conns |
int | 200 | Maximum idle connections |
max_idle_conns_per_host |
int | 20 | Maximum idle connections per host |
logging_level |
string | info | Logging level: info, debug, error |
master_key |
string | — | Required. Master key for client authentication |
default_models_rpm |
int | -1 | Default RPM limit for models (-1 = unlimited) |
model_prices_link |
string | — | URL or file path to model prices JSON |
Fail2Ban Parameters
| Parameter | Type | Description |
|---|---|---|
max_attempts |
int | Maximum failed attempts before banning a credential |
ban_duration |
string | Ban duration (permanent for permanent, or duration like 5m, 1h) |
error_codes |
[]int | HTTP status codes that trigger ban counting |
error_code_rules |
[]rule | Per-error-code override rules (see example below) |
Per-Error-Code Rules
Override max_attempts and ban_duration for specific error codes:
fail2ban:
max_attempts: 3
ban_duration: permanent
error_codes: [401, 403, 429, 500, 502, 503, 504]
error_code_rules:
- code: 429 # Rate limit errors
max_attempts: 5
ban_duration: 5m
Monitoring Parameters
| Parameter | Type | Description |
|---|---|---|
prometheus_enabled |
bool | Enable Prometheus metrics on /metrics |
log_errors |
bool | Enable error logging to file |
errors_log_path |
string | Path to error log file |
Note
The /health endpoint is always available and cannot be disabled or reconfigured.
Credentials
Each credential defines a connection to an LLM provider. See Providers for details on each type.
Common fields for all credentials:
| Field | Type | Description |
|---|---|---|
name |
string | Unique credential identifier |
type |
string | Provider type: openai, anthropic, vertex-ai, gemini, proxy |
rpm |
int | Requests per minute limit (-1 = unlimited) |
tpm |
int | Tokens per minute limit (-1 = unlimited) |
is_fallback |
bool | Use as fallback when primary credentials are exhausted |
Models
The models section binds specific models to credentials and optionally sets per-model rate limits.
By default, all models are available through all credentials. Use the models section to restrict which credentials serve which models.
See Load Balancing for details on multi-credential routing.