All use cases

Use case

AI cost optimization & FinOps

See where every token goes, cache repeat context, and route to the cheapest model that clears the bar.

See where every token goes

Per-request logging of model, tokens, latency and dollar cost turns an opaque LLM bill into a breakdown you can actually act on.

Route to the cheapest model that works

Send each request to the most affordable model that meets the bar, with frontier models reserved for the calls that truly need them.

Cache repeated context

Prompt caching stops you paying full price for system prompts and context that get resent on every call.

One view across every provider

Spend across OpenAI, Anthropic, Google and the rest lands in one dashboard instead of a dozen separate billing pages.

Most LLM bills are a black box — that's the real problem

Teams rarely overspend on LLMs on purpose. They overspend because the bill is opaque: one big number, no breakdown of which feature, model or prompt drove it, and no easy way to test a cheaper alternative. You can't optimize what you can't see.

LLM Gateway makes spend legible and then gives you the levers to lower it — visibility, routing, and caching — through one OpenAI-compatible API.

Step one: make every token visible

The gateway logs each request with its model, token counts, latency and exact dollar cost. Group those by API key and the opaque monthly number becomes a breakdown by team, environment or feature:

1import OpenAI from "openai";2
3const client = new OpenAI({4  baseURL: "https://api.llmgateway.io/v1",5  apiKey: process.env.LLM_GATEWAY_API_KEY,6});7
8// Every call below is logged with model, tokens, latency and cost —9// across every provider, in one dashboard.10const response = await client.chat.completions.create({11  model: "openai/gpt-5.1",12  messages,13});

Step two: route to the cheapest model that clears the bar

Once you can see cost per request, the waste is obvious — flagship models doing work a cheaper model handles fine. Routing lets you send each request to the most affordable model that meets your quality bar, and keep the expensive models for the calls that genuinely need them. Switching is a one-line model change, so testing a cheaper option is cheap itself.

Step three: stop paying for the same tokens twice

System prompts, instructions and shared context get resent on nearly every call. Prompt caching means those repeated tokens don't cost full price each time — one of the highest-leverage savings in most production workloads.

FinOps for AI, in one place

Per-key attribution, one dashboard across every provider, and cost data on every request give you the foundation of an AI FinOps practice: budgets you can defend, spend you can attribute, and a clear before-and-after when you tune routing or caching — without a rewrite.

Frequently asked questions

How does LLM Gateway reduce my LLM costs?

Three ways: visibility (per-request cost analytics so you can find waste), routing (send each request to the cheapest model that meets your quality bar), and caching (avoid paying full price for repeated prompt context). Together they cut spend without requiring you to rewrite your application.

Do I have to change my code to save money?

Very little. The gateway is OpenAI-compatible, so adopting it is a base-URL and key change. From there, routing, caching and analytics are configuration — not a rewrite of your application logic.

Can I attribute spend to teams or features?

Yes. Requests are logged and can be grouped by API key, so issuing keys per team, environment or feature gives you clean cost attribution — the foundation of any AI FinOps practice.

Does using a gateway add latency or markup?

The routing overhead is minimal, and the savings from caching and cheaper-model routing typically outweigh it many times over. You also get one consolidated view of spend instead of reconciling bills across providers.

One API for every model

Route across 200+ models with fallback, caching and per-request cost analytics. Start free in minutes.