Configuration

Auto AI Router is configured via a YAML file passed with the -config flag.

See the full example in config.yaml.example.

Full Example

server:
  port: 8080
  max_body_size_mb: 100
  response_body_multiplier: 10
  request_timeout: 60s
  write_timeout: 60s
  idle_timeout: 2m
  idle_conn_timeout: 120s
  max_idle_conns: 200
  max_idle_conns_per_host: 20
  logging_level: info
  master_key: "sk-your-master-key-here"
  default_models_rpm: -1
  model_prices_link: ""

fail2ban:
  max_attempts: 3
  ban_duration: permanent
  error_codes: [401, 403, 429, 500, 502, 503, 504]
  # error_code_rules:
  #   - code: 429
  #     max_attempts: 5
  #     ban_duration: 5m

monitoring:
  prometheus_enabled: true
  log_errors: false
  errors_log_path: "logs/logs.jsonl"

credentials:
  - name: "openai_main"
    type: "openai"
    api_key: "sk-proj-xxxxx"
    base_url: "https://api.openai.com"
    rpm: 100
    tpm: 50000

  - name: "vertex_ai"
    type: "vertex-ai"
    project_id: "your-project-id"
    location: "global"
    credentials_file: "path/to/service-account.json"
    rpm: 100
    tpm: 50000

  - name: "gemini_studio"
    type: "gemini"
    api_key: "os.environ/GEMINI_API_KEY"
    base_url: "https://generativelanguage.googleapis.com"
    rpm: 60
    tpm: -1

  - name: "proxy_fallback"
    type: "proxy"
    base_url: "http://backup-router.local:8080"
    api_key: "sk-remote-master-key"
    rpm: 200
    tpm: 100000
    is_fallback: true

models:
  - name: "gpt-4o"
    credential: openai_main
    rpm: 100
    tpm: 50000
  - name: "gemini-2.5-pro"
    credential: vertex_ai
    rpm: 100
    tpm: 50000

litellm_db:
  enabled: false
  is_required: false
  database_url: "os.environ/LITELLM_DATABASE_URL"
  max_conns: 25
  min_conns: 5
  health_check_interval: 10s
  connect_timeout: 5s
  auth_cache_ttl: 20s
  auth_cache_size: 10000
  log_queue_size: 5000
  log_batch_size: 100
  log_flush_interval: 5s
  log_retry_attempts: 3
  log_retry_delay: 1s

Server Parameters

Parameter	Type	Default	Description
`port`	int	8080	Listen port
`max_body_size_mb`	int	100	Maximum request body size (MB)
`response_body_multiplier`	int	10	Response body limit = max_body_size_mb * this value
`request_timeout`	duration	60s	Request timeout
`write_timeout`	duration	60s	HTTP server write timeout
`idle_timeout`	duration	2m	HTTP server idle timeout (default: 2 * write_timeout)
`idle_conn_timeout`	duration	120s	Idle connection timeout for keep-alive connections
`max_idle_conns`	int	200	Maximum idle connections
`max_idle_conns_per_host`	int	20	Maximum idle connections per host
`logging_level`	string	info	Logging level: `info`, `debug`, `error`
`master_key`	string	—	Required. Master key for client authentication
`default_models_rpm`	int	-1	Default RPM limit for models (-1 = unlimited)
`model_prices_link`	string	—	URL or file path to model prices JSON

Fail2Ban Parameters

Parameter	Type	Description
`max_attempts`	int	Maximum failed attempts before banning a credential
`ban_duration`	string	Ban duration (`permanent` for permanent, or duration like `5m`, `1h`)
`error_codes`	[]int	HTTP status codes that trigger ban counting
`error_code_rules`	[]rule	Per-error-code override rules (see example below)

Per-Error-Code Rules

Override max_attempts and ban_duration for specific error codes:

fail2ban:
  max_attempts: 3
  ban_duration: permanent
  error_codes: [401, 403, 429, 500, 502, 503, 504]
  error_code_rules:
    - code: 429      # Rate limit errors
      max_attempts: 5
      ban_duration: 5m

Monitoring Parameters

Parameter	Type	Description
`prometheus_enabled`	bool	Enable Prometheus metrics on `/metrics`
`log_errors`	bool	Enable error logging to file
`errors_log_path`	string	Path to error log file

Note

The /health endpoint is always available and cannot be disabled or reconfigured.

Credentials

Each credential defines a connection to an LLM provider. See Providers for details on each type.

Common fields for all credentials:

Field	Type	Description
`name`	string	Unique credential identifier
`type`	string	Provider type: `openai`, `anthropic`, `vertex-ai`, `gemini`, `proxy`
`rpm`	int	Requests per minute limit (-1 = unlimited)
`tpm`	int	Tokens per minute limit (-1 = unlimited)
`is_fallback`	bool	Use as fallback when primary credentials are exhausted

Models

The models section binds specific models to credentials and optionally sets per-model rate limits.

models:
  - name: "gpt-4o"
    credential: openai_main
    rpm: 100
    tpm: 50000

By default, all models are available through all credentials. Use the models section to restrict which credentials serve which models.

By default, models can also be declared directly inside a credential via the models: field — they are automatically extracted and added to the global models list with the credential name pre-filled.

See Load Balancing for details on multi-credential routing.

YAML Anchors for Models

When many credentials share the same set of models, YAML anchors eliminate repetition. Define a template once with &anchor-name and reference it with *anchor-name.

List anchor in `x-model-templates`

The x-model-templates top-level key is a dedicated namespace for anchor definitions. It is not processed by the router — its sole purpose is to hold anchors so they can be referenced elsewhere.

x-model-templates:
  vertex-base-models: &vertex-base-models
    - name: gemini-2.5-flash
      rpm: 100
      tpm: 50000
    - name: gemini-2.5-pro
      rpm: 50
      tpm: 100000

credentials:
  - name: "vertex_v1"
    type: "vertex-ai"
    project_id: "proj-1"
    location: "global"
    credentials_file: "keys/proj-1.json"
    rpm: 100
    models: *vertex-base-models   # expands to the full list

  - name: "vertex_v2"
    type: "vertex-ai"
    project_id: "proj-2"
    location: "global"
    credentials_file: "keys/proj-2.json"
    rpm: 100
    models: *vertex-base-models   # same list, credential set to "vertex_v2"

Each model copy automatically gets the parent credential name injected, so no manual credential: field is needed inside the template.

Single-model anchor

An anchor can also target a single model mapping and be used as an item in a models: list:

x-model-templates:
  flash: &flash
    name: gemini-2.5-flash
    rpm: 100
    tpm: 50000

credentials:
  - name: "vertex_v1"
    type: "vertex-ai"
    project_id: "proj-1"
    location: "global"
    credentials_file: "keys/proj-1.json"
    rpm: 100
    models:
      - *flash               # single model from anchor
      - name: gemini-2.5-pro # inline model
        rpm: 50
        tpm: 100000

Expanding a list anchor inside the top-level `models:` section

A list anchor can be expanded inline within the top-level models: sequence. The router flattens the result so all items end up as a flat list:

x-model-templates:
  shared-models: &shared-models
    - name: gemini-2.5-flash
      credential: vertex_v1
      rpm: 100
      tpm: 50000
    - name: gemini-2.5-pro
      credential: vertex_v1
      rpm: 50
      tpm: 100000

models:
  - *shared-models        # expands and flattens both items into the list
  - name: gpt-4o
    credential: openai_main
    rpm: 60
    tpm: 80000

Supported combinations

Syntax	Location	Result
`models: *list-anchor`	inside a credential	list items added with that credential name
`- *list-anchor`	inside a credential's `models:`	list items added with that credential name
`- *single-model-anchor`	inside a credential's `models:`	single model added with that credential name
`- *list-anchor`	top-level `models:`	list expanded and flattened into the sequence