LLM Gateway for Coding Agents & AI Assistants

Coding agents are only as good as the models behind them

Modern coding agents — Claude Code, Cursor, Cline, or the bespoke one you're building — call a model dozens of times to finish a single task: plan, read files, edit, run, repeat. That puts three demands on whatever sits behind the agent: access to the right model for each step, reliability across long runs, and visibility into what it all costs.

LLM Gateway is that layer. One OpenAI-compatible endpoint, 200+ models, automatic fallback, caching, and a cost ledger for every request.

Use the best model for each step — without rewriting anything

A planning step might want Claude Opus 4.7's reasoning; a quick edit is fine on a cheaper open-weight model; a long-context review wants Gemini 3.1 Pro. With the gateway, that's a one-line change:

1import OpenAI from "openai";2
3const client = new OpenAI({4  baseURL: "https://api.llmgateway.io/v1",5  apiKey: process.env.LLM_GATEWAY_API_KEY,6});7
8const response = await client.chat.completions.create({9  model: "anthropic/claude-opus-4-7", // swap for openai/gpt-5.1 or google-ai-studio/gemini-3.1-pro-preview10  messages: [11    { role: "user", content: "Refactor this module for testability." },12  ],13});

1import OpenAI from "openai";2
3const client = new OpenAI({4  baseURL: "https://api.llmgateway.io/v1",5  apiKey: process.env.LLM_GATEWAY_API_KEY,6});7
8const response = await client.chat.completions.create({9  model: "anthropic/claude-opus-4-7", // swap for openai/gpt-5.1 or google-ai-studio/gemini-3.1-pro-preview10  messages: [11    { role: "user", content: "Refactor this module for testability." },12  ],13});

Same request shape, any provider. No per-provider SDKs, no second set of keys.

Keep long runs alive with automatic fallback

The fastest way to ruin an agent run is a provider hiccup three minutes in. Define a fallback chain and the gateway handles it — if the primary model rate-limits or errors, the request routes to the next model automatically. Your agent finishes the task; your users never see the blip.

Know what every agent costs

Token bills are invisible until they're a problem. The gateway logs every request — model, tokens, latency, dollar cost — and lets you slice spend by API key. Give each agent or project its own key and you get a clean, per-agent breakdown, plus prompt caching to stop paying full price for the repository context that gets resent on every step.

Get started

Point your agent at the gateway base URL, drop in a key, and you're routing across every model with fallback and analytics from the first request.

Frequently asked questions

Does LLM Gateway work with Claude Code, Cursor and Cline?

Yes. LLM Gateway exposes an OpenAI- and Anthropic-compatible API, so any tool that speaks those formats — Claude Code, Cursor, Cline, Aider, Continue, or your own agent — works by pointing it at the gateway base URL and an API key. No SDK changes required.

How does model fallback work?

You can define a primary model and one or more fallbacks. If the primary provider returns an error or rate-limits, the gateway automatically routes the request to the next model in the chain, so your agent keeps running instead of failing the task.

Can I track cost per agent or per project?

Yes. Every request is logged with its model, token counts, latency and dollar cost. You can group spend by API key, so giving each agent, project or environment its own key gives you clean per-agent cost breakdowns.

Will switching models break my code?

No. Because the gateway is OpenAI-compatible, switching from, say, GPT-5.5 to Claude Opus 4.7 is a one-line change to the model string. The request and response shapes stay the same.

Coding agents & AI assistants

Every model, one key

Automatic fallback

Per-request cost analytics

Prompt caching