Smarter Routing: Sticky Sessions, Bedrock Regions & More
Pin conversations to one provider for warm caches with x-session-id, route Bedrock by region, and reach Gemini via Google AI Studio plus Vertex embeddings.

A batch of routing upgrades that make multi-turn apps faster, give you finer control over where requests land, and expand the providers you can reach.
Sticky Session Routing
Multi-turn conversations now stay on the same upstream provider, so provider-side prompt caches stay warm across turns — fewer cold reads, lower latency, and lower cost.
Just pass a stable identifier on the request:
1curl -X POST "https://api.llmgateway.io/v1/chat/completions" \2 -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \3 -H "x-session-id: conversation-1234" \4 -H "Content-Type: application/json" \5 -d '{ "model": "anthropic/claude-opus-4-8", "messages": [...] }'1curl -X POST "https://api.llmgateway.io/v1/chat/completions" \2 -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \3 -H "x-session-id: conversation-1234" \4 -H "Content-Type: application/json" \5 -d '{ "model": "anthropic/claude-opus-4-8", "messages": [...] }'- Send the same
x-session-idfor every turn in a conversation - If you don't set the header, the gateway falls back to Anthropic's
metadata.user_idor the OpenAIuserfield - Works alongside auto-select — your session sticks to the provider it first landed on
Stable Preferred-Provider Routing
When you set a preferred provider, routing now sticks to it consistently instead of drifting between equivalent options — predictable behavior for teams that have standardized on a specific upstream.
AWS Bedrock Region Routing
Bedrock requests now default to a global cross-region inference profile, with the option to pin a specific region group (US, EU, APAC) or an exact region. One set of credentials works across all of them.
- Defaults to
globalfor the best availability - Override with
LLM_AWS_BEDROCK_REGIONto keep traffic in a specific geography for data-residency needs
Google AI Studio Provider
Gemini models are now reachable through Google AI Studio (google-ai-studio) in addition to Vertex AI — bring whichever Google credentials you already have.
Embeddings Get More Reach & Resilience
Following the embeddings launch, the /v1/embeddings endpoint now:
- Supports Google Vertex AI embedding models
- Falls back to another key on the same provider if one key fails
- Surfaces key-health and routing metadata, just like chat completions