Cheapest Models

Models at or under $0.20 per million input tokens ($1.50 output) — for classification, extraction, and high-volume workloads

Compare

104

Models

Providers

Vision Models

Tool-enabled

Free Models


Google AI Studio	gemma-3-12b-it	$0.07	$0.30	—
Google AI Studio	gemma-3-4b-it	$0.07	$0.30	—
Google AI Studio	gemma-3-1b-it	$0.07	$0.30	—
Google AI Studio	gemma-3n-e4b-it	$0.07	$0.30	—
Google AI Studio	gemma-3n-e2b-it	$0.07	$0.30	—
Google Vertex AI	gemini-1.5-flash-8b	$0.04	$0.15	—
Google AI Studio	gemini-1.5-flash-8b	$0.04	$0.15	—
Google AI Studio	gemini-1.5-flash	$0.04	$0.15	—
Google Vertex AI	gemini-1.5-flash	$0.04	$0.15	—
Google AI Studio	gemini-2.5-flash-preview-04-17-thinking	$0.15	$0.60	—
Google Vertex AI	gemini-2.5-flash-preview-04-17-thinking	$0.15	$0.60	—
Google Vertex AI	gemini-2.5-flash-preview-05-20	$0.15	$0.60	—
Google AI Studio	gemini-2.5-flash-preview-05-20	$0.15	$0.60	—
Google AI Studio	gemini-2.5-flash-preview-04-17	$0.15	$0.60	—
Google Vertex AI	gemini-2.5-flash-preview-04-17	$0.15	$0.60	—
OpenAI	gpt-5-nano	$0.05	$0.40	$0.01
Azure	gpt-5-nano	$0.05	$0.40	$0.01
Groq	gpt-oss-20b	$0.10	$0.50	—
Together AI	gpt-oss-20b	$0.05	$0.20	—
NanoGPT	gpt-oss-20b	$0.04	$0.15	—
Cerebras	gpt-oss-120b	$0.35	$0.75	—
Together AI	gpt-oss-120b	$0.15	$0.60	—
ByteDance	gpt-oss-120b	$0.10	$0.50	$0.02
Azure	gpt-oss-120b	$0.15	$0.60	—
NanoGPT	gpt-oss-120b	$0.05	$0.25	—
Nebius AI	gpt-oss-120b	$0.15	$0.60	—
Groq	gpt-oss-120b	$0.15	$0.75	—
Azure	gpt-4.1-nano	$0.10	$0.40	$0.02
OpenAI	gpt-4.1-nano	$0.10	$0.40	$0.02
OpenAI	gpt-4o-mini	$0.15	$0.60	$0.07

Every model here costs at most $0.20 per million input tokens and $1.50 per million output tokens — some, like Qwen3 4B and Llama 3.2 3B, as little as $0.03. At these prices a million-token workload costs a few cents, which changes what's economical: classify every support ticket, summarize every call, run an LLM check on every commit.

Cheap doesn't mean toy: GPT-OSS 120B, GLM-4.7 Flash, Gemini 2.5 Flash-Lite, and Qwen3.5 9B punch far above their price on everyday tasks. Route high-volume work here and reserve frontier models for the requests that actually need them.

Frequently asked questions

What is the cheapest LLM API?

In this catalog, Llama 3.2 3B and Qwen3 4B start around $0.03 per million input tokens, with GPT-OSS 20B at about $0.04 and GLM-4.7 Flash at $0.06. Prices differ per provider, so check the list — the same model is often cheaper through one provider than another.

Are cheap models good enough for production?

For classification, extraction, routing, summarization, and simple chat — usually yes. Small models fail mostly on multi-step reasoning and niche knowledge. A common pattern is a cheap model as the default with automatic escalation to a frontier model when confidence is low.

How else can I cut LLM costs?

Cache responses for repeated requests, use cached input pricing for long shared prefixes, batch offline work, and set per-project spending limits. Routing through a gateway also lets you switch to whichever provider currently offers the lowest price for the same model with zero code changes.

Are there free models?

Yes — free mappings show up at $0.00 in this list. They're rate-limited and best for prototyping; for production traffic, the paid models on this page are the reliable low-cost option.

Support

Welcome!

Cheapest Models

Use Case

Capabilities

Provider

Input Price ($/M tokens)

Output Price ($/M tokens)

Context Size (tokens)

Frequently asked questions

What is the cheapest LLM API?

Are cheap models good enough for production?

How else can I cut LLM costs?

Are there free models?

Stay ahead of the curve