Coding agents are only as good as the models behind them
Modern coding agents — Claude Code, Cursor, Cline, or the bespoke one you're building — call a model dozens of times to finish a single task: plan, read files, edit, run, repeat. That puts three demands on whatever sits behind the agent: access to the right model for each step, reliability across long runs, and visibility into what it all costs.
LLM Gateway is that layer. One OpenAI-compatible endpoint, 200+ models, automatic fallback, caching, and a cost ledger for every request.
Use the best model for each step — without rewriting anything
A planning step might want Claude Opus 4.7's reasoning; a quick edit is fine on a cheaper open-weight model; a long-context review wants Gemini 3.1 Pro. With the gateway, that's a one-line change:
1import OpenAI from "openai";2
3const client = new OpenAI({4 baseURL: "https://api.llmgateway.io/v1",5 apiKey: process.env.LLM_GATEWAY_API_KEY,6});7
8const response = await client.chat.completions.create({9 model: "anthropic/claude-opus-4-7", // swap for openai/gpt-5.1 or google-ai-studio/gemini-3.1-pro-preview10 messages: [11 { role: "user", content: "Refactor this module for testability." },12 ],13});1import OpenAI from "openai";2
3const client = new OpenAI({4 baseURL: "https://api.llmgateway.io/v1",5 apiKey: process.env.LLM_GATEWAY_API_KEY,6});7
8const response = await client.chat.completions.create({9 model: "anthropic/claude-opus-4-7", // swap for openai/gpt-5.1 or google-ai-studio/gemini-3.1-pro-preview10 messages: [11 { role: "user", content: "Refactor this module for testability." },12 ],13});Same request shape, any provider. No per-provider SDKs, no second set of keys.
Keep long runs alive with automatic fallback
The fastest way to ruin an agent run is a provider hiccup three minutes in. Define a fallback chain and the gateway handles it — if the primary model rate-limits or errors, the request routes to the next model automatically. Your agent finishes the task; your users never see the blip.
Know what every agent costs
Token bills are invisible until they're a problem. The gateway logs every request — model, tokens, latency, dollar cost — and lets you slice spend by API key. Give each agent or project its own key and you get a clean, per-agent breakdown, plus prompt caching to stop paying full price for the repository context that gets resent on every step.
Get started
Point your agent at the gateway base URL, drop in a key, and you're routing across every model with fallback and analytics from the first request.