Most LLM bills are a black box — that's the real problem
Teams rarely overspend on LLMs on purpose. They overspend because the bill is opaque: one big number, no breakdown of which feature, model or prompt drove it, and no easy way to test a cheaper alternative. You can't optimize what you can't see.
LLM Gateway makes spend legible and then gives you the levers to lower it — visibility, routing, and caching — through one OpenAI-compatible API.
Step one: make every token visible
The gateway logs each request with its model, token counts, latency and exact dollar cost. Group those by API key and the opaque monthly number becomes a breakdown by team, environment or feature:
1import OpenAI from "openai";2
3const client = new OpenAI({4 baseURL: "https://api.llmgateway.io/v1",5 apiKey: process.env.LLM_GATEWAY_API_KEY,6});7
8// Every call below is logged with model, tokens, latency and cost —9// across every provider, in one dashboard.10const response = await client.chat.completions.create({11 model: "openai/gpt-5.1",12 messages,13});1import OpenAI from "openai";2
3const client = new OpenAI({4 baseURL: "https://api.llmgateway.io/v1",5 apiKey: process.env.LLM_GATEWAY_API_KEY,6});7
8// Every call below is logged with model, tokens, latency and cost —9// across every provider, in one dashboard.10const response = await client.chat.completions.create({11 model: "openai/gpt-5.1",12 messages,13});Step two: route to the cheapest model that clears the bar
Once you can see cost per request, the waste is obvious — flagship models doing work a cheaper model handles fine. Routing lets you send each request to the most affordable model that meets your quality bar, and keep the expensive models for the calls that genuinely need them. Switching is a one-line model change, so testing a cheaper option is cheap itself.
Step three: stop paying for the same tokens twice
System prompts, instructions and shared context get resent on nearly every call. Prompt caching means those repeated tokens don't cost full price each time — one of the highest-leverage savings in most production workloads.
FinOps for AI, in one place
Per-key attribution, one dashboard across every provider, and cost data on every request give you the foundation of an AI FinOps practice: budgets you can defend, spend you can attribute, and a clear before-and-after when you tune routing or caching — without a rewrite.