All use cases

Use case

Coding agents & AI assistants

One API key behind Claude Code, Cursor, Cline and your own agents — with model fallback and a real cost ledger.

Every model, one key

Claude Opus 4.7, GPT-5.5, Gemini 3.1 Pro, Qwen, Kimi, DeepSeek — 200+ models behind a single OpenAI-compatible endpoint. Switch by changing one string.

Automatic fallback

When a provider rate-limits or errors, the gateway retries on another. Long agent runs keep going instead of dying mid-task.

Per-request cost analytics

Every call is logged with its model, tokens, latency and exact dollar cost — so you can see what each agent, project or user actually spends.

Prompt caching

Cache the system prompt and repository context that agents resend on every step, and stop paying full price for the same tokens.

Coding agents are only as good as the models behind them

Modern coding agents — Claude Code, Cursor, Cline, or the bespoke one you're building — call a model dozens of times to finish a single task: plan, read files, edit, run, repeat. That puts three demands on whatever sits behind the agent: access to the right model for each step, reliability across long runs, and visibility into what it all costs.

LLM Gateway is that layer. One OpenAI-compatible endpoint, 200+ models, automatic fallback, caching, and a cost ledger for every request.

Use the best model for each step — without rewriting anything

A planning step might want Claude Opus 4.7's reasoning; a quick edit is fine on a cheaper open-weight model; a long-context review wants Gemini 3.1 Pro. With the gateway, that's a one-line change:

1import OpenAI from "openai";2
3const client = new OpenAI({4  baseURL: "https://api.llmgateway.io/v1",5  apiKey: process.env.LLM_GATEWAY_API_KEY,6});7
8const response = await client.chat.completions.create({9  model: "anthropic/claude-opus-4-7", // swap for openai/gpt-5.1 or google-ai-studio/gemini-3.1-pro-preview10  messages: [11    { role: "user", content: "Refactor this module for testability." },12  ],13});

Same request shape, any provider. No per-provider SDKs, no second set of keys.

Keep long runs alive with automatic fallback

The fastest way to ruin an agent run is a provider hiccup three minutes in. Define a fallback chain and the gateway handles it — if the primary model rate-limits or errors, the request routes to the next model automatically. Your agent finishes the task; your users never see the blip.

Know what every agent costs

Token bills are invisible until they're a problem. The gateway logs every request — model, tokens, latency, dollar cost — and lets you slice spend by API key. Give each agent or project its own key and you get a clean, per-agent breakdown, plus prompt caching to stop paying full price for the repository context that gets resent on every step.

Get started

Point your agent at the gateway base URL, drop in a key, and you're routing across every model with fallback and analytics from the first request.

Frequently asked questions

Does LLM Gateway work with Claude Code, Cursor and Cline?

Yes. LLM Gateway exposes an OpenAI- and Anthropic-compatible API, so any tool that speaks those formats — Claude Code, Cursor, Cline, Aider, Continue, or your own agent — works by pointing it at the gateway base URL and an API key. No SDK changes required.

How does model fallback work?

You can define a primary model and one or more fallbacks. If the primary provider returns an error or rate-limits, the gateway automatically routes the request to the next model in the chain, so your agent keeps running instead of failing the task.

Can I track cost per agent or per project?

Yes. Every request is logged with its model, token counts, latency and dollar cost. You can group spend by API key, so giving each agent, project or environment its own key gives you clean per-agent cost breakdowns.

Will switching models break my code?

No. Because the gateway is OpenAI-compatible, switching from, say, GPT-5.5 to Claude Opus 4.7 is a one-line change to the model string. The request and response shapes stay the same.

One API for every model

Route across 200+ models with fallback, caching and per-request cost analytics. Start free in minutes.