Routing Strategies: Cheapest, Fastest & Defaults
Steer multi-provider routing with a new routing field — auto, price, throughput, or latency — per request or as a per-project default. Each strategy still falls back when the top pick has bad uptime.

When a model is served by more than one provider, the gateway scores them on price, reliability, speed, and cache support and picks the best. Now you can bias that decision toward the one factor you care about — without giving up the automatic fallback that keeps requests reliable.
The routing field
Add routing to any chat completions request to choose the strategy:
1curl -X POST "https://api.llmgateway.io/v1/chat/completions" \2 -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \3 -H "Content-Type: application/json" \4 -d '{5 "model": "deepseek-v3.2",6 "messages": [{"role": "user", "content": "Hello!"}],7 "routing": "price"8 }'1curl -X POST "https://api.llmgateway.io/v1/chat/completions" \2 -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \3 -H "Content-Type: application/json" \4 -d '{5 "model": "deepseek-v3.2",6 "messages": [{"role": "user", "content": "Hello!"}],7 "routing": "price"8 }'| Strategy | Behavior |
|---|---|
auto (default) | Full weighted score across price, uptime, throughput, latency, and cache. |
price | Strongly prefer the cheapest provider. |
throughput | Strongly prefer the fastest-generating provider. |
latency | Strongly prefer the lowest time-to-first-token (streaming). |
Each non-auto strategy still keeps a reliability floor: a provider with extremely bad uptime is skipped in favor of a healthy one, so price gives you the cheapest provider that actually works — not one that's effectively down.
Set a per-project default
Don't want to pass the field on every request? Set a default routing strategy for the whole project under Settings → Routing in the dashboard. Requests that omit routing use the project default; an explicit routing on a request always wins.
Works with coding plans
DevPass coding plans support auto and price (the cache-aware strategies that keep prompt caching effective). The dashboard greys out throughput and latency for those projects.
No surprises for pinned providers
Strategies only affect multi-provider routing. Combining routing with a pinned provider — e.g. openai/gpt-4o — returns a 400 rather than silently doing nothing. And an explicit single-factor strategy disables random exploration, so selection stays deterministic.