Automatic Retry & Fallback with Full Routing Transparency

When a provider fails, LLMGateway now automatically retries your request on another provider. Every attempt is logged with full routing visibility, so you always know what happened.

February 12, 2026

Automatic Retry & Fallback

LLMGateway now automatically retries failed requests on alternate providers. If your request hits a 500 error, timeout, or connection failure on the first provider, the gateway seamlessly retries on the next best provider — all within the same API call.

How It Works

Your request is routed to the best available provider using our smart routing algorithm
If that provider fails (5xx, timeout, network error), the gateway automatically selects the next best provider
The retry happens transparently — your application receives the successful response as if nothing went wrong
Up to 2 retries are attempted before returning an error

Full Routing Transparency

Every provider attempt is now tracked in the routing array on both the API response metadata and in your activity logs:

1{
2  "metadata": {
3    "routing": [
4      {
5        "provider": "openai",
6        "model": "gpt-4o",
7        "status_code": 500,
8        "error_type": "server_error",
9        "succeeded": false
10      },
11      {
12        "provider": "azure",
13        "model": "gpt-4o",
14        "status_code": 200,
15        "error_type": "none",
16        "succeeded": true
17      }
18    ]
19  }
20}

1{
2  "metadata": {
3    "routing": [
4      {
5        "provider": "openai",
6        "model": "gpt-4o",
7        "status_code": 500,
8        "error_type": "server_error",
9        "succeeded": false
10      },
11      {
12        "provider": "azure",
13        "model": "gpt-4o",
14        "status_code": 200,
15        "error_type": "none",
16        "succeeded": true
17      }
18    ]
19  }
20}

Retried Log Linking

Failed attempts that were retried are clearly marked in your activity logs:

A "Retried" badge appears on failed logs that were successfully retried
Each retried log links directly to the successful log that replaced it
You can click through from a failed log to see the successful response

This means you'll never mistake a retried failure for an actual unrecovered error.

Uptime-Aware Routing

Failed attempts still count against the provider's uptime score. If a provider keeps failing:

Its uptime score drops in real-time
The exponential penalty kicks in below 95% uptime
Future requests are automatically routed away from it
Your application stays reliable without any code changes

Controlling Fallback Behavior

Disable Fallback

Use the X-No-Fallback: true header to disable automatic retries:

1curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
2  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
3  -H "Content-Type: application/json" \
4  -H "X-No-Fallback: true" \
5  -d '{
6    "model": "openai/gpt-4o",
7    "messages": [{"role": "user", "content": "Hello!"}]
8  }'

1curl -X POST "https://api.llmgateway.io/v1/chat/completions" \
2  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \
3  -H "Content-Type: application/json" \
4  -H "X-No-Fallback: true" \
5  -d '{
6    "model": "openai/gpt-4o",
7    "messages": [{"role": "user", "content": "Hello!"}]
8  }'

When Fallback Is Disabled

Retries are automatically disabled when:

You set the X-No-Fallback: true header
You request a specific provider (e.g., openai/gpt-4o)
The error is a client error (4xx) rather than a server error

Read the routing docs for the full details on how routing and fallback work together.