Automatic Retry & Fallback with Full Routing Transparency
When a provider fails, LLMGateway now automatically retries your request on another provider. Every attempt is logged with full routing visibility, so you always know what happened.

Automatic Retry & Fallback
LLMGateway now automatically retries failed requests on alternate providers. If your request hits a 500 error, timeout, or connection failure on the first provider, the gateway seamlessly retries on the next best provider — all within the same API call.
How It Works
- Your request is routed to the best available provider using our smart routing algorithm
- If that provider fails (5xx, timeout, network error), the gateway automatically selects the next best provider
- The retry happens transparently — your application receives the successful response as if nothing went wrong
- Up to 2 retries are attempted before returning an error
Full Routing Transparency
Every provider attempt is now tracked in the routing array on both the API response metadata and in your activity logs:
1{2 "metadata": {3 "routing": [4 {5 "provider": "openai",6 "model": "gpt-4o",7 "status_code": 500,8 "error_type": "server_error",9 "succeeded": false10 },11 {12 "provider": "azure",13 "model": "gpt-4o",14 "status_code": 200,15 "error_type": "none",16 "succeeded": true17 }18 ]19 }20}
1{2 "metadata": {3 "routing": [4 {5 "provider": "openai",6 "model": "gpt-4o",7 "status_code": 500,8 "error_type": "server_error",9 "succeeded": false10 },11 {12 "provider": "azure",13 "model": "gpt-4o",14 "status_code": 200,15 "error_type": "none",16 "succeeded": true17 }18 ]19 }20}
Retried Log Linking
Failed attempts that were retried are clearly marked in your activity logs:
- A "Retried" badge appears on failed logs that were successfully retried
- Each retried log links directly to the successful log that replaced it
- You can click through from a failed log to see the successful response
This means you'll never mistake a retried failure for an actual unrecovered error.
Uptime-Aware Routing
Failed attempts still count against the provider's uptime score. If a provider keeps failing:
- Its uptime score drops in real-time
- The exponential penalty kicks in below 95% uptime
- Future requests are automatically routed away from it
- Your application stays reliable without any code changes
Controlling Fallback Behavior
Disable Fallback
Use the X-No-Fallback: true header to disable automatic retries:
1curl -X POST "https://api.llmgateway.io/v1/chat/completions" \2 -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \3 -H "Content-Type: application/json" \4 -H "X-No-Fallback: true" \5 -d '{6 "model": "openai/gpt-4o",7 "messages": [{"role": "user", "content": "Hello!"}]8 }'
1curl -X POST "https://api.llmgateway.io/v1/chat/completions" \2 -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \3 -H "Content-Type: application/json" \4 -H "X-No-Fallback: true" \5 -d '{6 "model": "openai/gpt-4o",7 "messages": [{"role": "user", "content": "Hello!"}]8 }'
When Fallback Is Disabled
Retries are automatically disabled when:
- You set the
X-No-Fallback: trueheader - You request a specific provider (e.g.,
openai/gpt-4o) - The error is a client error (4xx) rather than a server error
Read the routing docs for the full details on how routing and fallback work together.