Support

AI-powered help

Welcome!

Please introduce yourself before we start.

    LLM Gateway
    • Docs
    • Pricing
    • Pricing
    • Docs
    • Models
      • AI Gateway
      • DevPass
      • Chat Playground
      • Observability
      • Enterprise
      • Blog
      • Changelog
      • Integrations
      • Reliability
      • Guardrails
      • Providers
      • Apps
      • Models
      • Model Timeline
      • Compare
      • Token Cost Calculator
      • Referral Program
      • MCP Server
      • Agents
      • AI SDK Provider
      • Agent Skills
      • Templates
      • Guides
    1.4k
    Log InGet Started

    Cheapest Models

    Models at or under $0.20 per million input tokens ($1.50 output) — for classification, extraction, and high-volume workloads

    Compare

    Use Case

    Capabilities

    Provider

    Input Price ($/M tokens)

    Output Price ($/M tokens)

    Context Size (tokens)

    104
    Models
    44
    Providers
    36
    Vision Models
    65
    Tool-enabled
    4
    Free Models
    Features
    Anthropic
    claude-haiku-4-5-free
    $0.00$0.00$0.00
    Mistral AI
    mistral-ocr-latest
    $0.00$0.00—
    DeepInfra
    qwen3.5-9b
    $0.10$0.15—
    NovitaAI
    gemma-4-26b-a4b-it
    $0.07$0.34—
    DeepInfra
    gemma-4-26b-a4b-it
    $0.07$0.34—
    Cerebras
    gemma-4-31b-it
    $0.99$1.49—
    NovitaAI
    gemma-4-31b-it
    $0.13$0.38—
    DeepInfra
    gemma-4-31b-it
    $0.13$0.38—
    Together AI
    gemma-4-31b-it
    $0.13$0.38—
    ElevenLabs
    eleven-turbo-v2-5
    $0.055/1K chars——
    ElevenLabs
    eleven-flash-v2-5
    $0.055/1K chars——
    ElevenLabs
    eleven-v3
    $0.11/1K chars——
    ElevenLabs
    eleven-multilingual-v2
    $0.11/1K chars——
    Z AI
    glm-4.7-flash-free
    $0.00$0.00$0.00
    OpenAI
    tts-1-hd
    $0.03/1K chars——
    OpenAI
    tts-1
    $0.015/1K chars——
    Xiaomi
    mimo-v2.5
    $0.14$0.28$0.00
    Alibaba Cloud(singapore)
    deepseek-v4-flash
    $0.20$0.40$0.04
    DeepSeek
    deepseek-v4-flash
    $0.14$0.28$0.00
    DeepInfra
    deepseek-v4-flash
    $0.14$0.28$0.03
    NovitaAI
    deepseek-v4-flash
    $0.14$0.28$0.03
    Alibaba Cloud(cn-beijing)
    deepseek-v4-flash
    $0.14$0.28$0.03
    Alibaba Cloud
    deepseek-v4-flash
    $0.20$0.40$0.04
    Xiaomi
    mimo-v2-flash
    $0.10$0.30$0.02
    EmberCloud
    qwen3-coder-next
    $0.11$0.68$0.06
    OpenAI
    gpt-5.4-nano
    $0.20$1.25$0.02
    Azure
    gpt-5.4-nano
    $0.20$1.25$0.02
    Azure AI Foundry
    grok-4-1-fast
    $0.20$0.50—
    xAI
    grok-4-1-fast
    $0.20$0.50$0.05
    xAI
    grok-4-fast
    $0.20$0.50$0.05
    Alibaba Cloud(singapore)
    qwen35-397b-a17b
    $0.60$3.60—
    NovitaAI
    qwen35-397b-a17b
    $0.60$3.60—
    Alibaba Cloud
    qwen35-397b-a17b
    $0.60$3.60—
    Nebius AI
    qwen35-397b-a17b
    $0.60$3.60—
    Alibaba Cloud(cn-beijing)
    qwen35-397b-a17b
    $0.17$1.03—
    Mistral AI
    devstral-small-2507
    $0.10$0.30—
    Mistral AI
    ministral-3b-2512
    $0.10$0.10—
    Mistral AI
    ministral-8b-2512
    $0.15$0.15—
    Mistral AI
    ministral-14b-2512
    $0.20$0.20—
    Mistral AI
    mistral-small-2506
    $0.10$0.30—
    MiniMax
    minimax-m2.5
    $0.30$1.20$0.03
    EmberCloud
    minimax-m2.5
    $0.20$1.20$0.04
    Nebius AI
    minimax-m2.5
    $0.30$1.20—
    NovitaAI
    minimax-m2.5
    $0.30$1.20$0.03
    Together AI
    minimax-m2.5
    $0.30$1.20—
    NovitaAI
    hermes-2-pro-llama-3-8b
    $0.14$0.14—
    NovitaAI
    qwen3-4b-fp8
    $0.03$0.03—
    NovitaAI
    qwen3-30b-a3b-fp8
    $0.09$0.45—
    NovitaAI
    qwen3-32b-fp8
    $0.10$0.45—
    NovitaAI
    qwen3-vl-30b-a3b-thinking
    $0.20$1.00—
    Page 1 of 4

    Every model here costs at most $0.20 per million input tokens and $1.50 per million output tokens — some, like Qwen3 4B and Llama 3.2 3B, as little as $0.03. At these prices a million-token workload costs a few cents, which changes what's economical: classify every support ticket, summarize every call, run an LLM check on every commit.

    Cheap doesn't mean toy: GPT-OSS 120B, GLM-4.7 Flash, Gemini 2.5 Flash-Lite, and Qwen3.5 9B punch far above their price on everyday tasks. Route high-volume work here and reserve frontier models for the requests that actually need them.

    Frequently asked questions

    What is the cheapest LLM API?

    In this catalog, Llama 3.2 3B and Qwen3 4B start around $0.03 per million input tokens, with GPT-OSS 20B at about $0.04 and GLM-4.7 Flash at $0.06. Prices differ per provider, so check the list — the same model is often cheaper through one provider than another.

    Are cheap models good enough for production?

    For classification, extraction, routing, summarization, and simple chat — usually yes. Small models fail mostly on multi-step reasoning and niche knowledge. A common pattern is a cheap model as the default with automatic escalation to a frontier model when confidence is low.

    How else can I cut LLM costs?

    Cache responses for repeated requests, use cached input pricing for long shared prefixes, batch offline work, and set per-project spending limits. Routing through a gateway also lets you switch to whichever provider currently offers the lowest price for the same model with zero code changes.

    Are there free models?

    Yes — free mappings show up at $0.00 in this list. They're rate-limited and best for prototyping; for production traffic, the paid models on this page are the reliable low-cost option.

    Newsletter

    Stay ahead of the curve

    Join developers who get weekly insights on LLM routing, new model launches, and cost optimization — straight to their inbox.

    • New models & providers as they drop
    • Tips to cut latency & costs
    • Early access to beta features

    No spam. Unsubscribe anytime.

    All systems operational
    AICPA SOC for Service Organizations badgeSOC 2 Type II
    compliant

    Product

    • Features
    • Models
    • Providers
    • Chat Playground
    • Changelog
    • DevPass
    • Compare Models
    • Enterprise

    Resources

    • Apps
    • Templates
    • Agents
    • MCP Server
    • Use Cases
    • Blog
    • Documentation
    • Integrations
    • Guides
    • Brand Assets
    • Token Cost Calculator
    • Referral Program
    • GitHub
    • Contact Us

    Community

    • Twitter
    • Discord

    Compliance

    • Trust Center
    • Security Portal
    • Terms
    • Privacy Policy
    • GDPR
    • SOC 2 Type II
    • Status

    Compare

    • OpenRouter
    • LiteLLM
    • Portkey
    • Migration Guides

    Models

    • Text Generation
    • Text to Image
    • Image to Image
    • Video Generation
    • Embeddings
    • Vision
    • Reasoning
    • Tool Calling
    • Web Search
    • Discounted
    • Best for Roleplay
    • Best for Coding
    • Best for Creative Writing
    • Best for Translation
    • Best for Math
    • Long Context
    • Cheapest
    • Open Source

    Providers

    • OpenAI
    • Anthropic
    • Google AI Studio
    • Glacier
    • Granite
    • Google Vertex AI
    • Vertex AI (OpenAI-compatible)
    • Vertex AI (Anthropic)
    • Quartz
    • Avalanche
    • Groq
    • Cerebras
    • xAI
    • DeepSeek
    • Alibaba Cloud
    • NovitaAI
    • AtlasCloud
    • AWS Bedrock
    • Azure
    • Azure AI Foundry
    • Z AI
    • Moonshot AI
    • Perplexity
    • Nebius AI
    • Mistral AI
    • CanopyWave
    • Inference.net
    • Together AI
    • Custom
    • NanoGPT
    • ByteDance
    • MiniMax
    • EmberCloud
    • Sakana AI
    • Tundra
    • Xiaomi
    • DeepInfra
    • Reve
    • ElevenLabs

    © 2026 LLM Gateway. All rights reserved.