Support

AI-powered help

Welcome!

Please introduce yourself before we start.

    LLM Gateway
    • Docs
    • Pricing
    • Pricing
    • Docs
    • Models
      • AI Gateway
      • DevPass
      • Chat Playground
      • Observability
      • Enterprise
      • Blog
      • Changelog
      • Integrations
      • Reliability
      • Guardrails
      • Providers
      • Apps
      • Models
      • Model Timeline
      • Compare
      • Token Cost Calculator
      • Referral Program
      • MCP Server
      • Agents
      • AI SDK Provider
      • Agent Skills
      • Templates
      • Guides
    1.4k
    Log InGet Started

    Best Models for Math

    Reasoning models for competition math, quantitative analysis, and step-by-step problem solving

    Compare

    Use Case

    Capabilities

    Provider

    Input Price ($/M tokens)

    Output Price ($/M tokens)

    Context Size (tokens)

    18
    Models
    44
    Providers
    12
    Vision Models
    18
    Tool-enabled
    0
    Free Models
    Features
    Granite
    glm-5.2
    $1.40$1.12
    -20% off
    $4.40$3.52
    -20% off
    $0.26$0.21
    -20% off
    Z AI
    glm-5.2
    $1.40$4.40$0.26
    EmberCloud
    glm-5.2
    $1.26$3.96$0.23
    AWS Bedrock(us)
    claude-fable-5
    $11.00$55.00$1.10
    AWS Bedrock(global)
    claude-fable-5
    $10.00$50.00$1.00
    Anthropic
    claude-fable-5
    $10.00$50.00$1.00
    AWS Bedrock
    claude-fable-5
    $10.00$50.00$1.00
    MiniMax
    minimax-m3
    $0.60$2.40$0.12
    AWS Bedrock(jp)
    claude-opus-4-8
    $5.50$27.50$0.55
    AWS Bedrock(us)
    claude-opus-4-8
    $5.50$27.50$0.55
    Anthropic
    claude-opus-4-8
    $5.00$25.00$0.50
    AWS Bedrock
    claude-opus-4-8
    $5.00$25.00$0.50
    AWS Bedrock(global)
    claude-opus-4-8
    $5.00$25.00$0.50
    AWS Bedrock(au)
    claude-opus-4-8
    $5.50$27.50$0.55
    AWS Bedrock(eu)
    claude-opus-4-8
    $5.50$27.50$0.55
    Vertex AI (OpenAI-compatible)
    grok-4-20-reasoning
    $1.25$2.50$0.20
    Xiaomi
    mimo-v2.5-pro
    $0.43$0.87$0.00
    AWS Bedrock(global)
    grok-4-3
    $1.25$2.50$0.20
    AWS Bedrock(us)
    grok-4-3
    $1.38$2.75$0.22
    xAI
    grok-4-3
    $1.25$2.50$0.31
    AWS Bedrock(us-west-2)
    grok-4-3
    $1.38$2.75$0.22
    AWS Bedrock
    grok-4-3
    $1.25$2.50$0.20
    Azure AI Foundry
    grok-4-3
    $1.25$2.50$0.20
    Alibaba Cloud
    qwen3.6-max-preview
    $1.30$7.80$0.13
    Alibaba Cloud(singapore)
    qwen3.6-max-preview
    $1.30$7.80$0.13
    OpenAI
    gpt-5.5-pro
    $30.00$180.00—
    Azure
    gpt-5.5
    $5.00$30.00$0.50
    OpenAI
    gpt-5.5
    $5.00$30.00$0.50
    DeepSeek
    deepseek-v4-pro
    $0.43$0.87$0.00
    Alibaba Cloud(singapore)
    deepseek-v4-pro
    $2.40$4.80$0.20
    Together AI
    deepseek-v4-pro
    $1.74$3.48$0.20
    Alibaba Cloud(cn-beijing)
    deepseek-v4-pro
    $1.65$3.30$0.14
    Alibaba Cloud
    deepseek-v4-pro
    $2.40$4.80$0.20
    DeepInfra
    deepseek-v4-pro
    $1.74$3.48$0.14
    Google AI Studio
    gemini-pro-latest
    $2.00$12.00$0.20
    Azure
    o4-mini
    $1.10$4.40$0.28
    OpenAI
    o4-mini
    $1.10$4.40$0.28
    Azure
    gpt-5.4-pro
    $30.00$180.00—
    OpenAI
    gpt-5.4-pro
    $30.00$180.00—
    Quartz
    gemini-3.1-pro-preview
    $2.00$12.00$0.20
    Google AI Studio
    gemini-3.1-pro-preview
    $2.00$12.00$0.20
    Google Vertex AI
    gemini-3.1-pro-preview
    $2.00$12.00$0.20
    Azure
    gpt-5.2-pro
    $21.00$168.00—
    OpenAI
    gpt-5.2-pro
    $21.00$168.00—
    Alibaba Cloud
    kimi-k2-thinking
    $0.57$2.29—
    Moonshot AI
    kimi-k2-thinking
    $0.60$2.50$0.15
    Alibaba Cloud(cn-beijing)
    kimi-k2-thinking
    $0.57$2.29—
    Vertex AI (OpenAI-compatible)
    kimi-k2-thinking
    $0.60$2.50$0.06
    ByteDance
    kimi-k2-thinking
    $0.60$2.50$0.12
    Nebius AI
    qwen3-235b-a22b-thinking-2507
    $0.20$0.60—
    Page 1 of 2

    Math is where reasoning models earn their keep: spending thinking tokens before answering dramatically improves accuracy on competition problems, proofs, and multi-step quantitative work. The strongest options are OpenAI's Pro-tier models, Claude Opus, and Gemini Pro — and, at a much lower price, open-weight reasoners like DeepSeek V4, Qwen's thinking models, and Xiaomi's MiMo.

    All of them are available through the same API here, so you can tune thinking budgets, compare answers across models, and route easy problems to cheap models while sending the hard ones to a Pro tier.

    Frequently asked questions

    What is the best LLM for math?

    GPT-5.5 Pro and GPT-5.4 Pro top most math evaluations, with Claude Opus 4.8 and Gemini 3.1 Pro close behind. DeepSeek V4 Pro and Qwen's 235B thinking model get remarkably close at a fraction of the cost, which makes them the default choice for high-volume math workloads.

    Do I need a reasoning model for math?

    For anything beyond arithmetic and simple algebra, yes. Reasoning models work through problems step by step before answering and are far more reliable on competition-style and multi-step problems. Most models here let you cap the thinking budget so you control cost per problem.

    Can LLMs be trusted for calculations?

    Not blindly. Models still make arithmetic slips inside otherwise-correct reasoning, so for production use pair the model with tool calling — let it call a calculator or run code — and use the LLM for setting up and interpreting the math rather than raw number crunching.

    How much do reasoning tokens cost?

    Reasoning tokens bill as output tokens, and hard problems can burn thousands of them. That's why per-token price matters double for math: DeepSeek V4 Pro at $0.87 per million output tokens can be orders of magnitude cheaper per problem than a Pro-tier frontier model — compare output prices in the list above.

    Newsletter

    Stay ahead of the curve

    Join developers who get weekly insights on LLM routing, new model launches, and cost optimization — straight to their inbox.

    • New models & providers as they drop
    • Tips to cut latency & costs
    • Early access to beta features

    No spam. Unsubscribe anytime.

    All systems operational
    AICPA SOC for Service Organizations badgeSOC 2 Type II
    compliant

    Product

    • Features
    • Models
    • Providers
    • Chat Playground
    • Changelog
    • DevPass
    • Compare Models
    • Enterprise

    Resources

    • Apps
    • Templates
    • Agents
    • MCP Server
    • Use Cases
    • Blog
    • Documentation
    • Integrations
    • Guides
    • Brand Assets
    • Token Cost Calculator
    • Referral Program
    • GitHub
    • Contact Us

    Community

    • Twitter
    • Discord

    Compliance

    • Trust Center
    • Security Portal
    • Terms
    • Privacy Policy
    • GDPR
    • SOC 2 Type II
    • Status

    Compare

    • OpenRouter
    • LiteLLM
    • Portkey
    • Migration Guides

    Models

    • Text Generation
    • Text to Image
    • Image to Image
    • Video Generation
    • Embeddings
    • Vision
    • Reasoning
    • Tool Calling
    • Web Search
    • Discounted
    • Best for Roleplay
    • Best for Coding
    • Best for Creative Writing
    • Best for Translation
    • Best for Math
    • Long Context
    • Cheapest
    • Open Source

    Providers

    • OpenAI
    • Anthropic
    • Google AI Studio
    • Glacier
    • Granite
    • Google Vertex AI
    • Vertex AI (OpenAI-compatible)
    • Vertex AI (Anthropic)
    • Quartz
    • Avalanche
    • Groq
    • Cerebras
    • xAI
    • DeepSeek
    • Alibaba Cloud
    • NovitaAI
    • AtlasCloud
    • AWS Bedrock
    • Azure
    • Azure AI Foundry
    • Z AI
    • Moonshot AI
    • Perplexity
    • Nebius AI
    • Mistral AI
    • CanopyWave
    • Inference.net
    • Together AI
    • Custom
    • NanoGPT
    • ByteDance
    • MiniMax
    • EmberCloud
    • Sakana AI
    • Tundra
    • Xiaomi
    • DeepInfra
    • Reve
    • ElevenLabs

    © 2026 LLM Gateway. All rights reserved.