Cheapest Models

Models at or under $0.20 per million input tokens ($1.50 output) — for classification, extraction, and high-volume workloads

Compare

104

Models

Providers

Vision Models

Tool-enabled

Free Models


Anthropic	claude-haiku-4-5-free	$0.00	$0.00	$0.00
Mistral AI	mistral-ocr-latest	$0.00	$0.00	—
DeepInfra	qwen3.5-9b	$0.10	$0.15	—
NovitaAI	gemma-4-26b-a4b-it	$0.07	$0.34	—
DeepInfra	gemma-4-26b-a4b-it	$0.07	$0.34	—
Cerebras	gemma-4-31b-it	$0.99	$1.49	—
NovitaAI	gemma-4-31b-it	$0.13	$0.38	—
DeepInfra	gemma-4-31b-it	$0.13	$0.38	—
Together AI	gemma-4-31b-it	$0.13	$0.38	—
ElevenLabs	eleven-turbo-v2-5	$0.055/1K chars	—	—
ElevenLabs	eleven-flash-v2-5	$0.055/1K chars	—	—
ElevenLabs	eleven-v3	$0.11/1K chars	—	—
ElevenLabs	eleven-multilingual-v2	$0.11/1K chars	—	—
Z AI	glm-4.7-flash-free	$0.00	$0.00	$0.00
OpenAI	tts-1-hd	$0.03/1K chars	—	—
OpenAI	tts-1	$0.015/1K chars	—	—
Xiaomi	mimo-v2.5	$0.14	$0.28	$0.00
Alibaba Cloud(singapore)	deepseek-v4-flash	$0.20	$0.40	$0.04
DeepSeek	deepseek-v4-flash	$0.14	$0.28	$0.00
DeepInfra	deepseek-v4-flash	$0.14	$0.28	$0.03
NovitaAI	deepseek-v4-flash	$0.14	$0.28	$0.03
Alibaba Cloud(cn-beijing)	deepseek-v4-flash	$0.14	$0.28	$0.03
Alibaba Cloud	deepseek-v4-flash	$0.20	$0.40	$0.04
Xiaomi	mimo-v2-flash	$0.10	$0.30	$0.02
EmberCloud	qwen3-coder-next	$0.11	$0.68	$0.06
OpenAI	gpt-5.4-nano	$0.20	$1.25	$0.02
Azure	gpt-5.4-nano	$0.20	$1.25	$0.02
Azure AI Foundry	grok-4-1-fast	$0.20	$0.50	—
xAI	grok-4-1-fast	$0.20	$0.50	$0.05
xAI	grok-4-fast	$0.20	$0.50	$0.05
Alibaba Cloud(singapore)	qwen35-397b-a17b	$0.60	$3.60	—
NovitaAI	qwen35-397b-a17b	$0.60	$3.60	—
Alibaba Cloud	qwen35-397b-a17b	$0.60	$3.60	—
Nebius AI	qwen35-397b-a17b	$0.60	$3.60	—
Alibaba Cloud(cn-beijing)	qwen35-397b-a17b	$0.17	$1.03	—
Mistral AI	devstral-small-2507	$0.10	$0.30	—
Mistral AI	ministral-3b-2512	$0.10	$0.10	—
Mistral AI	ministral-8b-2512	$0.15	$0.15	—
Mistral AI	ministral-14b-2512	$0.20	$0.20	—
Mistral AI	mistral-small-2506	$0.10	$0.30	—
MiniMax	minimax-m2.5	$0.30	$1.20	$0.03
EmberCloud	minimax-m2.5	$0.20	$1.20	$0.04
Nebius AI	minimax-m2.5	$0.30	$1.20	—
NovitaAI	minimax-m2.5	$0.30	$1.20	$0.03
Together AI	minimax-m2.5	$0.30	$1.20	—
NovitaAI	hermes-2-pro-llama-3-8b	$0.14	$0.14	—
NovitaAI	qwen3-4b-fp8	$0.03	$0.03	—
NovitaAI	qwen3-30b-a3b-fp8	$0.09	$0.45	—
NovitaAI	qwen3-32b-fp8	$0.10	$0.45	—
NovitaAI	qwen3-vl-30b-a3b-thinking	$0.20	$1.00	—

Every model here costs at most $0.20 per million input tokens and $1.50 per million output tokens — some, like Qwen3 4B and Llama 3.2 3B, as little as $0.03. At these prices a million-token workload costs a few cents, which changes what's economical: classify every support ticket, summarize every call, run an LLM check on every commit.

Cheap doesn't mean toy: GPT-OSS 120B, GLM-4.7 Flash, Gemini 2.5 Flash-Lite, and Qwen3.5 9B punch far above their price on everyday tasks. Route high-volume work here and reserve frontier models for the requests that actually need them.

Frequently asked questions

What is the cheapest LLM API?

In this catalog, Llama 3.2 3B and Qwen3 4B start around $0.03 per million input tokens, with GPT-OSS 20B at about $0.04 and GLM-4.7 Flash at $0.06. Prices differ per provider, so check the list — the same model is often cheaper through one provider than another.

Are cheap models good enough for production?

For classification, extraction, routing, summarization, and simple chat — usually yes. Small models fail mostly on multi-step reasoning and niche knowledge. A common pattern is a cheap model as the default with automatic escalation to a frontier model when confidence is low.

How else can I cut LLM costs?

Cache responses for repeated requests, use cached input pricing for long shared prefixes, batch offline work, and set per-project spending limits. Routing through a gateway also lets you switch to whichever provider currently offers the lowest price for the same model with zero code changes.

Are there free models?

Yes — free mappings show up at $0.00 in this list. They're rate-limited and best for prototyping; for production traffic, the paid models on this page are the reliable low-cost option.

Cheapest Models

Models at or under $0.20 per million input tokens ($1.50 output) — for classification, extraction, and high-volume workloads

Compare

104

Models

Providers

Vision Models

Tool-enabled

Free Models


Anthropic	claude-haiku-4-5-free	$0.00	$0.00	$0.00
Mistral AI	mistral-ocr-latest	$0.00	$0.00	—
DeepInfra	qwen3.5-9b	$0.10	$0.15	—
NovitaAI	gemma-4-26b-a4b-it	$0.07	$0.34	—
DeepInfra	gemma-4-26b-a4b-it	$0.07	$0.34	—
Cerebras	gemma-4-31b-it	$0.99	$1.49	—
NovitaAI	gemma-4-31b-it	$0.13	$0.38	—
DeepInfra	gemma-4-31b-it	$0.13	$0.38	—
Together AI	gemma-4-31b-it	$0.13	$0.38	—
ElevenLabs	eleven-turbo-v2-5	$0.055/1K chars	—	—
ElevenLabs	eleven-flash-v2-5	$0.055/1K chars	—	—
ElevenLabs	eleven-v3	$0.11/1K chars	—	—
ElevenLabs	eleven-multilingual-v2	$0.11/1K chars	—	—
Z AI	glm-4.7-flash-free	$0.00	$0.00	$0.00
OpenAI	tts-1-hd	$0.03/1K chars	—	—
OpenAI	tts-1	$0.015/1K chars	—	—
Xiaomi	mimo-v2.5	$0.14	$0.28	$0.00
Alibaba Cloud(singapore)	deepseek-v4-flash	$0.20	$0.40	$0.04
DeepSeek	deepseek-v4-flash	$0.14	$0.28	$0.00
DeepInfra	deepseek-v4-flash	$0.14	$0.28	$0.03
NovitaAI	deepseek-v4-flash	$0.14	$0.28	$0.03
Alibaba Cloud(cn-beijing)	deepseek-v4-flash	$0.14	$0.28	$0.03
Alibaba Cloud	deepseek-v4-flash	$0.20	$0.40	$0.04
Xiaomi	mimo-v2-flash	$0.10	$0.30	$0.02
EmberCloud	qwen3-coder-next	$0.11	$0.68	$0.06
OpenAI	gpt-5.4-nano	$0.20	$1.25	$0.02
Azure	gpt-5.4-nano	$0.20	$1.25	$0.02
Azure AI Foundry	grok-4-1-fast	$0.20	$0.50	—
xAI	grok-4-1-fast	$0.20	$0.50	$0.05
xAI	grok-4-fast	$0.20	$0.50	$0.05
Alibaba Cloud(singapore)	qwen35-397b-a17b	$0.60	$3.60	—
NovitaAI	qwen35-397b-a17b	$0.60	$3.60	—
Alibaba Cloud	qwen35-397b-a17b	$0.60	$3.60	—
Nebius AI	qwen35-397b-a17b	$0.60	$3.60	—
Alibaba Cloud(cn-beijing)	qwen35-397b-a17b	$0.17	$1.03	—
Mistral AI	devstral-small-2507	$0.10	$0.30	—
Mistral AI	ministral-3b-2512	$0.10	$0.10	—
Mistral AI	ministral-8b-2512	$0.15	$0.15	—
Mistral AI	ministral-14b-2512	$0.20	$0.20	—
Mistral AI	mistral-small-2506	$0.10	$0.30	—
MiniMax	minimax-m2.5	$0.30	$1.20	$0.03
EmberCloud	minimax-m2.5	$0.20	$1.20	$0.04
Nebius AI	minimax-m2.5	$0.30	$1.20	—
NovitaAI	minimax-m2.5	$0.30	$1.20	$0.03
Together AI	minimax-m2.5	$0.30	$1.20	—
NovitaAI	hermes-2-pro-llama-3-8b	$0.14	$0.14	—
NovitaAI	qwen3-4b-fp8	$0.03	$0.03	—
NovitaAI	qwen3-30b-a3b-fp8	$0.09	$0.45	—
NovitaAI	qwen3-32b-fp8	$0.10	$0.45	—
NovitaAI	qwen3-vl-30b-a3b-thinking	$0.20	$1.00	—

Frequently asked questions

What is the cheapest LLM API?

Are cheap models good enough for production?

How else can I cut LLM costs?

Are there free models?

Yes — free mappings show up at $0.00 in this list. They're rate-limited and best for prototyping; for production traffic, the paid models on this page are the reliable low-cost option.

Cheapest Models

Use Case

Capabilities

Provider

Input Price ($/M tokens)

Output Price ($/M tokens)

Context Size (tokens)

Frequently asked questions

What is the cheapest LLM API?

Are cheap models good enough for production?

How else can I cut LLM costs?

Are there free models?

Stay ahead of the curve

Support

Welcome!

Cheapest Models

Use Case

Capabilities

Provider

Input Price ($/M tokens)

Output Price ($/M tokens)

Context Size (tokens)

Frequently asked questions

What is the cheapest LLM API?

Are cheap models good enough for production?

How else can I cut LLM costs?

Are there free models?

Stay ahead of the curve