Cheapest Models
Models at or under $0.20 per million input tokens ($1.50 output) — for classification, extraction, and high-volume workloads
| Features | |||||
|---|---|---|---|---|---|
Google AI Studio | $0.07 | $0.30 | — | ||
Google AI Studio | $0.07 | $0.30 | — | ||
Google AI Studio | $0.07 | $0.30 | — | ||
Google AI Studio | $0.07 | $0.30 | — | ||
Google AI Studio | $0.07 | $0.30 | — | ||
Google Vertex AI | $0.04 | $0.15 | — | ||
Google AI Studio | $0.04 | $0.15 | — | ||
Google AI Studio | $0.04 | $0.15 | — | ||
Google Vertex AI | $0.04 | $0.15 | — | ||
Google AI Studio | $0.15 | $0.60 | — | ||
Google Vertex AI | $0.15 | $0.60 | — | ||
Google Vertex AI | $0.15 | $0.60 | — | ||
Google AI Studio | $0.15 | $0.60 | — | ||
Google AI Studio | $0.15 | $0.60 | — | ||
Google Vertex AI | $0.15 | $0.60 | — | ||
OpenAI | $0.05 | $0.40 | $0.01 | ||
Azure | $0.05 | $0.40 | $0.01 | ||
Groq | $0.10 | $0.50 | — | ||
Together AI | $0.05 | $0.20 | — | ||
NanoGPT | $0.04 | $0.15 | — | ||
Cerebras | $0.35 | $0.75 | — | ||
Together AI | $0.15 | $0.60 | — | ||
ByteDance | $0.10 | $0.50 | $0.02 | ||
Azure | $0.15 | $0.60 | — | ||
NanoGPT | $0.05 | $0.25 | — | ||
Nebius AI | $0.15 | $0.60 | — | ||
Groq | $0.15 | $0.75 | — | ||
Azure | $0.10 | $0.40 | $0.02 | ||
OpenAI | $0.10 | $0.40 | $0.02 | ||
OpenAI | $0.15 | $0.60 | $0.07 |
Every model here costs at most $0.20 per million input tokens and $1.50 per million output tokens — some, like Qwen3 4B and Llama 3.2 3B, as little as $0.03. At these prices a million-token workload costs a few cents, which changes what's economical: classify every support ticket, summarize every call, run an LLM check on every commit.
Cheap doesn't mean toy: GPT-OSS 120B, GLM-4.7 Flash, Gemini 2.5 Flash-Lite, and Qwen3.5 9B punch far above their price on everyday tasks. Route high-volume work here and reserve frontier models for the requests that actually need them.
Frequently asked questions
What is the cheapest LLM API?
In this catalog, Llama 3.2 3B and Qwen3 4B start around $0.03 per million input tokens, with GPT-OSS 20B at about $0.04 and GLM-4.7 Flash at $0.06. Prices differ per provider, so check the list — the same model is often cheaper through one provider than another.
Are cheap models good enough for production?
For classification, extraction, routing, summarization, and simple chat — usually yes. Small models fail mostly on multi-step reasoning and niche knowledge. A common pattern is a cheap model as the default with automatic escalation to a frontier model when confidence is low.
How else can I cut LLM costs?
Cache responses for repeated requests, use cached input pricing for long shared prefixes, batch offline work, and set per-project spending limits. Routing through a gateway also lets you switch to whichever provider currently offers the lowest price for the same model with zero code changes.
Are there free models?
Yes — free mappings show up at $0.00 in this list. They're rate-limited and best for prototyping; for production traffic, the paid models on this page are the reliable low-cost option.