Configuration
Auto AI Router is configured via a YAML file passed with the -config flag.
See the full example in config.yaml.example.
Full Example
server:
port: 8080
max_body_size_mb: 100
response_body_multiplier: 10
request_timeout: 60s
write_timeout: 60s
idle_timeout: 2m
idle_conn_timeout: 120s
max_idle_conns: 200
max_idle_conns_per_host: 20
logging_level: info
master_key: "sk-your-master-key-here"
default_models_rpm: -1
model_prices_link: ""
fail2ban:
max_attempts: 3
ban_duration: permanent
error_codes: [401, 403, 429, 500, 502, 503, 504]
# error_code_rules:
# - code: 429
# max_attempts: 5
# ban_duration: 5m
monitoring:
prometheus_enabled: true
log_errors: false
errors_log_path: "logs/logs.jsonl"
credentials:
- name: "openai_main"
type: "openai"
api_key: "sk-proj-xxxxx"
base_url: "https://api.openai.com"
rpm: 100
tpm: 50000
- name: "vertex_ai"
type: "vertex-ai"
project_id: "your-project-id"
location: "global"
credentials_file: "path/to/service-account.json"
rpm: 100
tpm: 50000
- name: "gemini_studio"
type: "gemini"
api_key: "os.environ/GEMINI_API_KEY"
base_url: "https://generativelanguage.googleapis.com"
rpm: 60
tpm: -1
- name: "proxy_fallback"
type: "proxy"
base_url: "http://backup-router.local:8080"
api_key: "sk-remote-master-key"
rpm: 200
tpm: 100000
is_fallback: true
models:
- name: "gpt-4o"
credential: openai_main
rpm: 100
tpm: 50000
- name: "gemini-2.5-pro"
credential: vertex_ai
rpm: 100
tpm: 50000
litellm_db:
enabled: false
is_required: false
database_url: "os.environ/LITELLM_DATABASE_URL"
max_conns: 25
min_conns: 5
health_check_interval: 10s
connect_timeout: 5s
auth_cache_ttl: 20s
auth_cache_size: 10000
log_queue_size: 5000
log_batch_size: 100
log_flush_interval: 5s
log_retry_attempts: 3
log_retry_delay: 1s
Server Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
port |
int | 8080 | Listen port |
max_body_size_mb |
int | 100 | Maximum request body size (MB) |
response_body_multiplier |
int | 10 | Response body limit = max_body_size_mb * this value |
request_timeout |
duration | 60s | Request timeout |
write_timeout |
duration | 60s | HTTP server write timeout |
idle_timeout |
duration | 2m | HTTP server idle timeout (default: 2 * write_timeout) |
idle_conn_timeout |
duration | 120s | Idle connection timeout for keep-alive connections |
max_idle_conns |
int | 200 | Maximum idle connections |
max_idle_conns_per_host |
int | 20 | Maximum idle connections per host |
logging_level |
string | info | Logging level: info, debug, error |
master_key |
string | — | Required. Master key for client authentication |
default_models_rpm |
int | -1 | Default RPM limit for models (-1 = unlimited) |
model_prices_link |
string | — | URL or file path to model prices JSON |
Fail2Ban Parameters
| Parameter | Type | Description |
|---|---|---|
max_attempts |
int | Maximum failed attempts before banning a credential |
ban_duration |
string | Ban duration (permanent for permanent, or duration like 5m, 1h) |
error_codes |
[]int | HTTP status codes that trigger ban counting |
error_code_rules |
[]rule | Per-error-code override rules (see example below) |
Per-Error-Code Rules
Override max_attempts and ban_duration for specific error codes:
fail2ban:
max_attempts: 3
ban_duration: permanent
error_codes: [401, 403, 429, 500, 502, 503, 504]
error_code_rules:
- code: 429 # Rate limit errors
max_attempts: 5
ban_duration: 5m
Monitoring Parameters
| Parameter | Type | Description |
|---|---|---|
prometheus_enabled |
bool | Enable Prometheus metrics on /metrics |
log_errors |
bool | Enable error logging to file |
errors_log_path |
string | Path to error log file |
Note
The /health endpoint is always available and cannot be disabled or reconfigured.
Credentials
Each credential defines a connection to an LLM provider. See Providers for details on each type.
Common fields for all credentials:
| Field | Type | Description |
|---|---|---|
name |
string | Unique credential identifier |
type |
string | Provider type: openai, anthropic, vertex-ai, gemini, proxy |
rpm |
int | Requests per minute limit (-1 = unlimited) |
tpm |
int | Tokens per minute limit (-1 = unlimited) |
is_fallback |
bool | Use as fallback when primary credentials are exhausted |
Models
The models section binds specific models to credentials and optionally sets per-model rate limits.
By default, all models are available through all credentials. Use the models section to restrict which credentials serve which models.
By default, models can also be declared directly inside a credential via the models: field — they are automatically extracted and added to the global models list with the credential name pre-filled.
See Load Balancing for details on multi-credential routing.
YAML Anchors for Models
When many credentials share the same set of models, YAML anchors eliminate repetition.
Define a template once with &anchor-name and reference it with *anchor-name.
List anchor in x-model-templates
The x-model-templates top-level key is a dedicated namespace for anchor definitions. It is not processed by the router — its sole purpose is to hold anchors so they can be referenced elsewhere.
x-model-templates:
vertex-base-models: &vertex-base-models
- name: gemini-2.5-flash
rpm: 100
tpm: 50000
- name: gemini-2.5-pro
rpm: 50
tpm: 100000
credentials:
- name: "vertex_v1"
type: "vertex-ai"
project_id: "proj-1"
location: "global"
credentials_file: "keys/proj-1.json"
rpm: 100
models: *vertex-base-models # expands to the full list
- name: "vertex_v2"
type: "vertex-ai"
project_id: "proj-2"
location: "global"
credentials_file: "keys/proj-2.json"
rpm: 100
models: *vertex-base-models # same list, credential set to "vertex_v2"
Each model copy automatically gets the parent credential name injected, so no manual credential: field is needed inside the template.
Single-model anchor
An anchor can also target a single model mapping and be used as an item in a models: list:
x-model-templates:
flash: &flash
name: gemini-2.5-flash
rpm: 100
tpm: 50000
credentials:
- name: "vertex_v1"
type: "vertex-ai"
project_id: "proj-1"
location: "global"
credentials_file: "keys/proj-1.json"
rpm: 100
models:
- *flash # single model from anchor
- name: gemini-2.5-pro # inline model
rpm: 50
tpm: 100000
Expanding a list anchor inside the top-level models: section
A list anchor can be expanded inline within the top-level models: sequence. The router flattens the result so all items end up as a flat list:
x-model-templates:
shared-models: &shared-models
- name: gemini-2.5-flash
credential: vertex_v1
rpm: 100
tpm: 50000
- name: gemini-2.5-pro
credential: vertex_v1
rpm: 50
tpm: 100000
models:
- *shared-models # expands and flattens both items into the list
- name: gpt-4o
credential: openai_main
rpm: 60
tpm: 80000
Supported combinations
| Syntax | Location | Result |
|---|---|---|
models: *list-anchor |
inside a credential | list items added with that credential name |
- *list-anchor |
inside a credential's models: |
list items added with that credential name |
- *single-model-anchor |
inside a credential's models: |
single model added with that credential name |
- *list-anchor |
top-level models: |
list expanded and flattened into the sequence |