Routing Strategies: Cheapest, Fastest & Defaults

Steer multi-provider routing with a new routing field — auto, price, throughput, or latency — per request or as a per-project default. Each strategy still falls back when the top pick has bad uptime.

June 14, 2026

Routing strategies on LLM Gateway: auto, price, throughput and latency

When a model is served by more than one provider, the gateway scores them on price, reliability, speed, and cache support and picks the best. Now you can bias that decision toward the one factor you care about — without giving up the automatic fallback that keeps requests reliable.

The `routing` field

Add routing to any chat completions request to choose the strategy:

1curl -X POST "https://api.llmgateway.io/v1/chat/completions" \2  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \3  -H "Content-Type: application/json" \4  -d '{5    "model": "deepseek-v3.2",6    "messages": [{"role": "user", "content": "Hello!"}],7    "routing": "price"8  }'

1curl -X POST "https://api.llmgateway.io/v1/chat/completions" \2  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \3  -H "Content-Type: application/json" \4  -d '{5    "model": "deepseek-v3.2",6    "messages": [{"role": "user", "content": "Hello!"}],7    "routing": "price"8  }'

Strategy	Behavior
`auto` (default)	Full weighted score across price, uptime, throughput, latency, and cache.
`price`	Strongly prefer the cheapest provider.
`throughput`	Strongly prefer the fastest-generating provider.
`latency`	Strongly prefer the lowest time-to-first-token (streaming).

Each non-auto strategy still keeps a reliability floor: a provider with extremely bad uptime is skipped in favor of a healthy one, so price gives you the cheapest provider that actually works — not one that's effectively down.

Set a per-project default

Don't want to pass the field on every request? Set a default routing strategy for the whole project under Settings → Routing in the dashboard. Requests that omit routing use the project default; an explicit routing on a request always wins.

Works with coding plans

DevPass coding plans support auto and price (the cache-aware strategies that keep prompt caching effective). The dashboard greys out throughput and latency for those projects.

No surprises for pinned providers

Strategies only affect multi-provider routing. Combining routing with a pinned provider — e.g. openai/gpt-4o — returns a 400 rather than silently doing nothing. And an explicit single-factor strategy disables random exploration, so selection stays deterministic.

Routing docs → | Open your dashboard →

Routing Strategies: Cheapest, Fastest & Defaults

The `routing` field

Set a per-project default

Works with coding plans

No surprises for pinned providers

Stay ahead of the curve

Support

Welcome!

The routing field

Set a per-project default

Works with coding plans

No surprises for pinned providers

The `routing` field