How to Estimate LLM Token Costs Before You Ship
A practical guide to forecasting LLM costs: the token formula, real-world examples across GPT-5.4, Claude, and Gemini, and a free calculator to run the numbers.

The first time a bill from OpenAI or Anthropic lands, most teams have the same reaction: "how did we spend that?" Not because the providers hid anything — the pricing is public — but because cost per token is a fiddly unit. A 1,200-token prompt feels abstract until it becomes 10 million of them in a month.
This post walks through how LLM costs actually add up, shows side-by-side math for common workloads, and points you at our free Token Cost Calculator so you can plug in your own numbers before you ship.
The Token Cost Formula
Every major provider charges the same way: a price per 1 million input tokens and a price per 1 million output tokens. Output tokens usually cost 4–8x more than input.
1cost_per_request2 = (input_tokens / 1_000_000) * input_price3 + (output_tokens / 1_000_000) * output_price
1cost_per_request2 = (input_tokens / 1_000_000) * input_price3 + (output_tokens / 1_000_000) * output_price
Roughly, 1 token ≈ 0.75 English words. A 1,000-word prompt is ~1,330 tokens. A 500-word response is ~670 tokens.
Three numbers you need
Before you can estimate anything, get three numbers:
- Average input tokens per request — prompt + system + any context/RAG chunks
- Average output tokens per request — what the model writes back
- Requests per day — multiplied out to the month
If you don't know (2) or (3) yet, estimate from a prototype: log 100 real requests, average them, and extrapolate. Rough numbers now beat precise numbers later.
Real Costs, Side by Side
A realistic chatbot request: 1,000 input tokens, 500 output tokens. Prices as of April 2026:
| Model | Cost per request | 10K/day | 1M/day |
|---|---|---|---|
| Claude Opus 4.7 | $0.01750 | $175.00 | $17,500 |
| GPT-5.4 | $0.01000 | $100.00 | $10,000 |
| Gemini 3.1 Pro | $0.00800 | $80.00 | $8,000 |
| Claude Sonnet 4.6 | $0.01050 | $105.00 | $10,500 |
| GPT-5.4 Mini | $0.00300 | $30.00 | $3,000 |
| Claude Haiku 4.5 | $0.00350 | $35.00 | $3,500 |
| Gemini 3.1 Flash Lite | $0.00100 | $10.00 | $1,000 |
| GPT-5.4 Nano | $0.00083 | $8.25 | $825 |
A few things jump out from the math, not the marketing:
- The gap between flagship and budget tiers is 20x, not 2x. A model choice that "doesn't matter for the prototype" becomes a six-figure line item at scale.
- Output tokens dominate total cost on any chat workload. If you're optimizing, cut response length before you cut prompt length.
- Mid-tier models (Sonnet 4.6, GPT-5.4 Mini, Gemini 3.1 Flash Lite) are where the quality-to-cost ratio is best for most real apps.
Where Estimates Go Wrong
Four things that quietly inflate real bills beyond the estimate:
1. System prompts count. Every request pays for the full system prompt. A 2,000-token system prompt adds $0.005 per request on GPT-5.4 — trivial once, $50,000/year at 10 requests per second.
2. RAG chunks are huge. Retrieval-augmented apps commonly inject 4–8k tokens of context. That input can outweigh the user's message by 10x.
3. Tool calls multiply requests. An agent that calls three tools and reads their outputs is four LLM round-trips, not one. Multiply the per-request cost by the average depth.
4. Retries and failures. Provider errors, timeouts, and schema validation retries all cost the same as successful requests. Budget 5–15% overhead.
A good rule of thumb: take your naive estimate and multiply by 1.5 to 2.0. If the real bill comes in under, great. If it comes in over, you already planned for it.
Use the Calculator
Plugging those three numbers (input tokens, output tokens, requests/day) into a spreadsheet works. It's also tedious to redo every time a new model ships.
We built a Token Cost Calculator that:
- Covers 300+ models across 25+ providers with up-to-date pricing
- Lets you compare official provider rates side-by-side
- Shows what the same workload costs on cheaper providers for the same model (DeepSeek, Groq, Cerebras, and others often undercut the official price)
- Generates a shareable link so you can send estimates to your team
No signup required. Try it: llmgateway.io/token-cost-calculator.
Three Ways to Lower the Bill Without Lowering Quality
Once you can estimate the cost, you can attack it. Three high-leverage moves:
1. Cache identical requests
Many production workloads repeat: FAQ bots, classification, batch pipelines, dev/test. A cache hit costs nothing. LLM Gateway caches responses in Redis with per-project TTLs from 10 seconds to a year — teams see 30–90% hit rates on suitable workloads. See how caching works.
2. Route to the cheapest healthy provider
The same model is often sold by multiple providers at different prices. DeepSeek-V3, Llama 3.3, Qwen, and others are available across Together, Fireworks, DeepInfra, Groq, and more — sometimes at half the price of the original. A gateway with smart routing picks the cheapest provider that meets your uptime and latency bar, automatically. See cut LLM costs with request routing.
3. Match the model to the task
Don't call GPT-5.4 to classify sentiment. A three-tier setup — Nano or Flash Lite for classification, Haiku or Mini for structured extraction, flagship for reasoning — typically cuts total spend 60–80% with no user-visible quality drop.
TL;DR
- Cost per request =
(input_tokens / 1M) * input_price + (output_tokens / 1M) * output_price - Output tokens dominate. Shorten responses before you shorten prompts.
- Flagship vs budget is ~20x. Pick the cheapest model that still meets quality.
- Multiply naive estimates by 1.5–2x to account for system prompts, RAG, tools, retries.
- Caching and smart routing are the two biggest levers — both are free on LLM Gateway.