Cheapest Models

Models at or under $0.20 per million input tokens ($1.50 output) — for classification, extraction, and high-volume workloads

Compare

104

Models

Providers

Vision Models

Tool-enabled

Free Models


NovitaAI	qwen3-vl-30b-a3b-instruct	$0.20	$0.70	—
NovitaAI	qwen3-235b-a22b-fp8	$0.20	$0.80	—
NovitaAI	llama-3.2-3b-instruct	$0.03	$0.05	—
NovitaAI	llama-3-8b-instruct	$0.04	$0.04	—
Alibaba Cloud(cn-beijing)	qwen3-vl-flash	$0.02	$0.21	$0.00
Alibaba Cloud(singapore)	qwen3-vl-flash	$0.05	$0.40	$0.01
Alibaba Cloud(us-virginia)	qwen3-vl-flash	$0.02	$0.21	$0.00
Alibaba Cloud	qwen3-vl-flash	$0.05	$0.40	$0.01
Alibaba Cloud	qwen3-vl-plus	$0.20	$1.60	$0.04
Alibaba Cloud(us-virginia)	qwen3-vl-plus	$0.14	$1.43	$0.03
Alibaba Cloud(cn-beijing)	qwen3-vl-plus	$0.14	$1.43	$0.03
Alibaba Cloud(singapore)	qwen3-vl-plus	$0.20	$1.60	$0.04
Alibaba Cloud	qwen3-coder-flash	$0.30	$1.50	$0.06
Alibaba Cloud(cn-beijing)	qwen3-coder-flash	$0.14	$0.57	$0.03
Alibaba Cloud(us-virginia)	qwen3-coder-flash	$0.14	$0.57	$0.03
Alibaba Cloud(singapore)	qwen3-coder-flash	$0.30	$1.50	$0.06
MiniMax	minimax-text-01	$0.20	$1.10	—
MiniMax	minimax-m2.1-lightning	$0.12	$0.48	—
NovitaAI	qwen3-vl-8b-instruct	$0.08	$0.50	—
EmberCloud	glm-4.7-flash	$0.06	$0.40	$0.01
Z AI	glm-4.7-flashx	$0.07	$0.40	$0.01
ByteDance	seed-1-6-flash-250715	$0.07	$0.30	$0.01
OpenAI	gpt-4o-mini-search-preview	$0.15	$0.60	—
Z AI	glm-4.6v-flashx	$0.04	$0.40	$0.00
Z AI	glm-4.6v-flash	$0.00	$0.00	$0.00
xAI	grok-4-1-fast-non-reasoning	$0.20	$0.50	$0.05
Azure AI Foundry	grok-4-1-fast-non-reasoning	$0.20	$0.50	—
Azure AI Foundry	grok-4-1-fast-reasoning	$0.20	$0.50	—
xAI	grok-4-1-fast-reasoning	$0.20	$0.50	$0.05
MiniMax	minimax-m2	$0.20	$1.00	$0.03
AWS Bedrock	llama-4-scout-17b-instruct	$0.17	$0.66	—
NovitaAI	llama-4-scout-17b-instruct	$0.18	$0.59	—
Google AI Studio	gemini-2.5-flash-lite-preview-09-2025	$0.10	$0.40	$0.01
Google Vertex AI	gemini-2.5-flash-lite-preview-09-2025	$0.10	$0.40	$0.01
Google Vertex AI	gemini-2.5-flash-lite	$0.10	$0.40	$0.01
Google AI Studio	gemini-2.5-flash-lite	$0.10	$0.40	$0.01
xAI	grok-4-fast-non-reasoning	$0.20	$0.50	$0.05
xAI	grok-4-fast-reasoning	$0.20	$0.50	$0.05
Z AI	glm-4-32b-0414-128k	$0.10	$0.10	$0.00
Z AI	glm-4.5-flash	$0.00	$0.00	$0.00
Z AI	glm-4.5-air	$0.20	$1.10	$0.03
EmberCloud	glm-4.5-air	$0.13	$0.85	$0.02
NovitaAI	qwen3-next-80b-a3b-instruct	$0.15	$1.50	—
Vertex AI (OpenAI-compatible)	qwen3-next-80b-a3b-instruct	$0.15	$1.20	—
Alibaba Cloud	qwen3-next-80b-a3b-instruct	$0.50	$2.00	—
NovitaAI	qwen3-next-80b-a3b-thinking	$0.15	$1.50	—
Nebius AI	qwen3-next-80b-a3b-thinking	$0.15	$1.20	—
Vertex AI (OpenAI-compatible)	qwen3-next-80b-a3b-thinking	$0.15	$1.20	—
Alibaba Cloud	qwen3-next-80b-a3b-thinking	$0.50	$6.00	—
Nebius AI	qwen3-30b-a3b-thinking-2507	$0.10	$0.30	—

Every model here costs at most $0.20 per million input tokens and $1.50 per million output tokens — some, like Qwen3 4B and Llama 3.2 3B, as little as $0.03. At these prices a million-token workload costs a few cents, which changes what's economical: classify every support ticket, summarize every call, run an LLM check on every commit.

Cheap doesn't mean toy: GPT-OSS 120B, GLM-4.7 Flash, Gemini 2.5 Flash-Lite, and Qwen3.5 9B punch far above their price on everyday tasks. Route high-volume work here and reserve frontier models for the requests that actually need them.

Frequently asked questions

What is the cheapest LLM API?

In this catalog, Llama 3.2 3B and Qwen3 4B start around $0.03 per million input tokens, with GPT-OSS 20B at about $0.04 and GLM-4.7 Flash at $0.06. Prices differ per provider, so check the list — the same model is often cheaper through one provider than another.

Are cheap models good enough for production?

For classification, extraction, routing, summarization, and simple chat — usually yes. Small models fail mostly on multi-step reasoning and niche knowledge. A common pattern is a cheap model as the default with automatic escalation to a frontier model when confidence is low.

How else can I cut LLM costs?

Cache responses for repeated requests, use cached input pricing for long shared prefixes, batch offline work, and set per-project spending limits. Routing through a gateway also lets you switch to whichever provider currently offers the lowest price for the same model with zero code changes.

Are there free models?

Yes — free mappings show up at $0.00 in this list. They're rate-limited and best for prototyping; for production traffic, the paid models on this page are the reliable low-cost option.

Support

Welcome!

Cheapest Models

Use Case

Capabilities

Provider

Input Price ($/M tokens)

Output Price ($/M tokens)

Context Size (tokens)

Frequently asked questions

What is the cheapest LLM API?

Are cheap models good enough for production?

How else can I cut LLM costs?

Are there free models?

Stay ahead of the curve