Skip to content

Configuration

Auto AI Router is configured via a YAML file passed with the -config flag.

See the full example in config.yaml.example.

Full Example

server:
  port: 8080
  max_body_size_mb: 100
  response_body_multiplier: 10
  request_timeout: 60s
  write_timeout: 60s
  idle_timeout: 2m
  idle_conn_timeout: 120s
  max_idle_conns: 200
  max_idle_conns_per_host: 20
  logging_level: info
  master_key: "sk-your-master-key-here"
  default_models_rpm: -1
  model_prices_link: ""

fail2ban:
  max_attempts: 3
  ban_duration: permanent
  error_codes: [401, 403, 429, 500, 502, 503, 504]
  # error_code_rules:
  #   - code: 429
  #     max_attempts: 5
  #     ban_duration: 5m

monitoring:
  prometheus_enabled: true
  log_errors: false
  errors_log_path: "logs/logs.jsonl"

credentials:
  - name: "openai_main"
    type: "openai"
    api_key: "sk-proj-xxxxx"
    base_url: "https://api.openai.com"
    rpm: 100
    tpm: 50000

  - name: "vertex_ai"
    type: "vertex-ai"
    project_id: "your-project-id"
    location: "global"
    credentials_file: "path/to/service-account.json"
    rpm: 100
    tpm: 50000

  - name: "gemini_studio"
    type: "gemini"
    api_key: "os.environ/GEMINI_API_KEY"
    base_url: "https://generativelanguage.googleapis.com"
    rpm: 60
    tpm: -1

  - name: "proxy_fallback"
    type: "proxy"
    base_url: "http://backup-router.local:8080"
    api_key: "sk-remote-master-key"
    rpm: 200
    tpm: 100000
    is_fallback: true

models:
  - name: "gpt-4o"
    credential: openai_main
    rpm: 100
    tpm: 50000
  - name: "gemini-2.5-pro"
    credential: vertex_ai
    rpm: 100
    tpm: 50000

litellm_db:
  enabled: false
  is_required: false
  database_url: "os.environ/LITELLM_DATABASE_URL"
  max_conns: 25
  min_conns: 5
  health_check_interval: 10s
  connect_timeout: 5s
  auth_cache_ttl: 20s
  auth_cache_size: 10000
  log_queue_size: 5000
  log_batch_size: 100
  log_flush_interval: 5s
  log_retry_attempts: 3
  log_retry_delay: 1s

Server Parameters

Parameter Type Default Description
port int 8080 Listen port
max_body_size_mb int 100 Maximum request body size (MB)
response_body_multiplier int 10 Response body limit = max_body_size_mb * this value
request_timeout duration 60s Request timeout
write_timeout duration 60s HTTP server write timeout
idle_timeout duration 2m HTTP server idle timeout (default: 2 * write_timeout)
idle_conn_timeout duration 120s Idle connection timeout for keep-alive connections
max_idle_conns int 200 Maximum idle connections
max_idle_conns_per_host int 20 Maximum idle connections per host
logging_level string info Logging level: info, debug, error
master_key string Required. Master key for client authentication
default_models_rpm int -1 Default RPM limit for models (-1 = unlimited)
model_prices_link string URL or file path to model prices JSON

Fail2Ban Parameters

Parameter Type Description
max_attempts int Maximum failed attempts before banning a credential
ban_duration string Ban duration (permanent for permanent, or duration like 5m, 1h)
error_codes []int HTTP status codes that trigger ban counting
error_code_rules []rule Per-error-code override rules (see example below)

Per-Error-Code Rules

Override max_attempts and ban_duration for specific error codes:

fail2ban:
  max_attempts: 3
  ban_duration: permanent
  error_codes: [401, 403, 429, 500, 502, 503, 504]
  error_code_rules:
    - code: 429      # Rate limit errors
      max_attempts: 5
      ban_duration: 5m

Monitoring Parameters

Parameter Type Description
prometheus_enabled bool Enable Prometheus metrics on /metrics
log_errors bool Enable error logging to file
errors_log_path string Path to error log file

Note

The /health endpoint is always available and cannot be disabled or reconfigured.

Credentials

Each credential defines a connection to an LLM provider. See Providers for details on each type.

Common fields for all credentials:

Field Type Description
name string Unique credential identifier
type string Provider type: openai, anthropic, vertex-ai, gemini, proxy
rpm int Requests per minute limit (-1 = unlimited)
tpm int Tokens per minute limit (-1 = unlimited)
is_fallback bool Use as fallback when primary credentials are exhausted

Models

The models section binds specific models to credentials and optionally sets per-model rate limits.

models:
  - name: "gpt-4o"
    credential: openai_main
    rpm: 100
    tpm: 50000

By default, all models are available through all credentials. Use the models section to restrict which credentials serve which models.

See Load Balancing for details on multi-credential routing.