Support

AI-powered help

Welcome!

Please introduce yourself before we start.

    LLM Gateway
    • Docs
    • Pricing
    • Pricing
    • Docs
    • Models
      • AI Gateway
      • DevPass
      • Chat Playground
      • Observability
      • Enterprise
      • Blog
      • Changelog
      • Integrations
      • Reliability
      • Guardrails
      • Providers
      • Apps
      • Models
      • Model Timeline
      • Compare
      • Token Cost Calculator
      • Referral Program
      • MCP Server
      • Agents
      • AI SDK Provider
      • Agent Skills
      • Templates
      • Guides
    1.4k
    Log InGet Started

    Cheapest Models

    Models at or under $0.20 per million input tokens ($1.50 output) — for classification, extraction, and high-volume workloads

    Compare

    Use Case

    Capabilities

    Provider

    Input Price ($/M tokens)

    Output Price ($/M tokens)

    Context Size (tokens)

    104
    Models
    44
    Providers
    36
    Vision Models
    65
    Tool-enabled
    4
    Free Models
    Features
    NovitaAI
    qwen3-vl-30b-a3b-instruct
    $0.20$0.70—
    NovitaAI
    qwen3-235b-a22b-fp8
    $0.20$0.80—
    NovitaAI
    llama-3.2-3b-instruct
    $0.03$0.05—
    NovitaAI
    llama-3-8b-instruct
    $0.04$0.04—
    Alibaba Cloud(cn-beijing)
    qwen3-vl-flash
    $0.02$0.21$0.00
    Alibaba Cloud(singapore)
    qwen3-vl-flash
    $0.05$0.40$0.01
    Alibaba Cloud(us-virginia)
    qwen3-vl-flash
    $0.02$0.21$0.00
    Alibaba Cloud
    qwen3-vl-flash
    $0.05$0.40$0.01
    Alibaba Cloud
    qwen3-vl-plus
    $0.20$1.60$0.04
    Alibaba Cloud(us-virginia)
    qwen3-vl-plus
    $0.14$1.43$0.03
    Alibaba Cloud(cn-beijing)
    qwen3-vl-plus
    $0.14$1.43$0.03
    Alibaba Cloud(singapore)
    qwen3-vl-plus
    $0.20$1.60$0.04
    Alibaba Cloud
    qwen3-coder-flash
    $0.30$1.50$0.06
    Alibaba Cloud(cn-beijing)
    qwen3-coder-flash
    $0.14$0.57$0.03
    Alibaba Cloud(us-virginia)
    qwen3-coder-flash
    $0.14$0.57$0.03
    Alibaba Cloud(singapore)
    qwen3-coder-flash
    $0.30$1.50$0.06
    MiniMax
    minimax-text-01
    $0.20$1.10—
    MiniMax
    minimax-m2.1-lightning
    $0.12$0.48—
    NovitaAI
    qwen3-vl-8b-instruct
    $0.08$0.50—
    EmberCloud
    glm-4.7-flash
    $0.06$0.40$0.01
    Z AI
    glm-4.7-flashx
    $0.07$0.40$0.01
    ByteDance
    seed-1-6-flash-250715
    $0.07$0.30$0.01
    OpenAI
    gpt-4o-mini-search-preview
    $0.15$0.60—
    Z AI
    glm-4.6v-flashx
    $0.04$0.40$0.00
    Z AI
    glm-4.6v-flash
    $0.00$0.00$0.00
    xAI
    grok-4-1-fast-non-reasoning
    $0.20$0.50$0.05
    Azure AI Foundry
    grok-4-1-fast-non-reasoning
    $0.20$0.50—
    Azure AI Foundry
    grok-4-1-fast-reasoning
    $0.20$0.50—
    xAI
    grok-4-1-fast-reasoning
    $0.20$0.50$0.05
    MiniMax
    minimax-m2
    $0.20$1.00$0.03
    AWS Bedrock
    llama-4-scout-17b-instruct
    $0.17$0.66—
    NovitaAI
    llama-4-scout-17b-instruct
    $0.18$0.59—
    Google AI Studio
    gemini-2.5-flash-lite-preview-09-2025
    $0.10$0.40$0.01
    Google Vertex AI
    gemini-2.5-flash-lite-preview-09-2025
    $0.10$0.40$0.01
    Google Vertex AI
    gemini-2.5-flash-lite
    $0.10$0.40$0.01
    Google AI Studio
    gemini-2.5-flash-lite
    $0.10$0.40$0.01
    xAI
    grok-4-fast-non-reasoning
    $0.20$0.50$0.05
    xAI
    grok-4-fast-reasoning
    $0.20$0.50$0.05
    Z AI
    glm-4-32b-0414-128k
    $0.10$0.10$0.00
    Z AI
    glm-4.5-flash
    $0.00$0.00$0.00
    Z AI
    glm-4.5-air
    $0.20$1.10$0.03
    EmberCloud
    glm-4.5-air
    $0.13$0.85$0.02
    NovitaAI
    qwen3-next-80b-a3b-instruct
    $0.15$1.50—
    Vertex AI (OpenAI-compatible)
    qwen3-next-80b-a3b-instruct
    $0.15$1.20—
    Alibaba Cloud
    qwen3-next-80b-a3b-instruct
    $0.50$2.00—
    NovitaAI
    qwen3-next-80b-a3b-thinking
    $0.15$1.50—
    Nebius AI
    qwen3-next-80b-a3b-thinking
    $0.15$1.20—
    Vertex AI (OpenAI-compatible)
    qwen3-next-80b-a3b-thinking
    $0.15$1.20—
    Alibaba Cloud
    qwen3-next-80b-a3b-thinking
    $0.50$6.00—
    Nebius AI
    qwen3-30b-a3b-thinking-2507
    $0.10$0.30—
    Page 2 of 4

    Every model here costs at most $0.20 per million input tokens and $1.50 per million output tokens — some, like Qwen3 4B and Llama 3.2 3B, as little as $0.03. At these prices a million-token workload costs a few cents, which changes what's economical: classify every support ticket, summarize every call, run an LLM check on every commit.

    Cheap doesn't mean toy: GPT-OSS 120B, GLM-4.7 Flash, Gemini 2.5 Flash-Lite, and Qwen3.5 9B punch far above their price on everyday tasks. Route high-volume work here and reserve frontier models for the requests that actually need them.

    Frequently asked questions

    What is the cheapest LLM API?

    In this catalog, Llama 3.2 3B and Qwen3 4B start around $0.03 per million input tokens, with GPT-OSS 20B at about $0.04 and GLM-4.7 Flash at $0.06. Prices differ per provider, so check the list — the same model is often cheaper through one provider than another.

    Are cheap models good enough for production?

    For classification, extraction, routing, summarization, and simple chat — usually yes. Small models fail mostly on multi-step reasoning and niche knowledge. A common pattern is a cheap model as the default with automatic escalation to a frontier model when confidence is low.

    How else can I cut LLM costs?

    Cache responses for repeated requests, use cached input pricing for long shared prefixes, batch offline work, and set per-project spending limits. Routing through a gateway also lets you switch to whichever provider currently offers the lowest price for the same model with zero code changes.

    Are there free models?

    Yes — free mappings show up at $0.00 in this list. They're rate-limited and best for prototyping; for production traffic, the paid models on this page are the reliable low-cost option.

    Newsletter

    Stay ahead of the curve

    Join developers who get weekly insights on LLM routing, new model launches, and cost optimization — straight to their inbox.

    • New models & providers as they drop
    • Tips to cut latency & costs
    • Early access to beta features

    No spam. Unsubscribe anytime.

    All systems operational
    AICPA SOC for Service Organizations badgeSOC 2 Type II
    compliant

    Product

    • Features
    • Models
    • Providers
    • Chat Playground
    • Changelog
    • DevPass
    • Compare Models
    • Enterprise

    Resources

    • Apps
    • Templates
    • Agents
    • MCP Server
    • Use Cases
    • Blog
    • Documentation
    • Integrations
    • Guides
    • Brand Assets
    • Token Cost Calculator
    • Referral Program
    • GitHub
    • Contact Us

    Community

    • Twitter
    • Discord

    Compliance

    • Trust Center
    • Security Portal
    • Terms
    • Privacy Policy
    • GDPR
    • SOC 2 Type II
    • Status

    Compare

    • OpenRouter
    • LiteLLM
    • Portkey
    • Migration Guides

    Models

    • Text Generation
    • Text to Image
    • Image to Image
    • Video Generation
    • Embeddings
    • Vision
    • Reasoning
    • Tool Calling
    • Web Search
    • Discounted
    • Best for Roleplay
    • Best for Coding
    • Best for Creative Writing
    • Best for Translation
    • Best for Math
    • Long Context
    • Cheapest
    • Open Source

    Providers

    • OpenAI
    • Anthropic
    • Google AI Studio
    • Glacier
    • Granite
    • Google Vertex AI
    • Vertex AI (OpenAI-compatible)
    • Vertex AI (Anthropic)
    • Quartz
    • Avalanche
    • Groq
    • Cerebras
    • xAI
    • DeepSeek
    • Alibaba Cloud
    • NovitaAI
    • AtlasCloud
    • AWS Bedrock
    • Azure
    • Azure AI Foundry
    • Z AI
    • Moonshot AI
    • Perplexity
    • Nebius AI
    • Mistral AI
    • CanopyWave
    • Inference.net
    • Together AI
    • Custom
    • NanoGPT
    • ByteDance
    • MiniMax
    • EmberCloud
    • Sakana AI
    • Tundra
    • Xiaomi
    • DeepInfra
    • Reve
    • ElevenLabs

    © 2026 LLM Gateway. All rights reserved.