Best Models for Math

Reasoning models for competition math, quantitative analysis, and step-by-step problem solving

Models

Providers

Vision Models

Tool-enabled

Free Models


Granite	glm-5.2	$1.40$1.12 -20% off	$4.40$3.52 -20% off	$0.26$0.21 -20% off
Z AI	glm-5.2	$1.40	$4.40	$0.26
EmberCloud	glm-5.2	$1.26	$3.96	$0.23
AWS Bedrock(us)	claude-fable-5	$11.00	$55.00	$1.10
AWS Bedrock(global)	claude-fable-5	$10.00	$50.00	$1.00
Anthropic	claude-fable-5	$10.00	$50.00	$1.00
AWS Bedrock	claude-fable-5	$10.00	$50.00	$1.00
MiniMax	minimax-m3	$0.60	$2.40	$0.12
AWS Bedrock(jp)	claude-opus-4-8	$5.50	$27.50	$0.55
AWS Bedrock(us)	claude-opus-4-8	$5.50	$27.50	$0.55
Anthropic	claude-opus-4-8	$5.00	$25.00	$0.50
AWS Bedrock	claude-opus-4-8	$5.00	$25.00	$0.50
AWS Bedrock(global)	claude-opus-4-8	$5.00	$25.00	$0.50
AWS Bedrock(au)	claude-opus-4-8	$5.50	$27.50	$0.55
AWS Bedrock(eu)	claude-opus-4-8	$5.50	$27.50	$0.55
Vertex AI (OpenAI-compatible)	grok-4-20-reasoning	$1.25	$2.50	$0.20
Xiaomi	mimo-v2.5-pro	$0.43	$0.87	$0.00
AWS Bedrock(global)	grok-4-3	$1.25	$2.50	$0.20
AWS Bedrock(us)	grok-4-3	$1.38	$2.75	$0.22
xAI	grok-4-3	$1.25	$2.50	$0.31
AWS Bedrock(us-west-2)	grok-4-3	$1.38	$2.75	$0.22
AWS Bedrock	grok-4-3	$1.25	$2.50	$0.20
Azure AI Foundry	grok-4-3	$1.25	$2.50	$0.20
Alibaba Cloud	qwen3.6-max-preview	$1.30	$7.80	$0.13
Alibaba Cloud(singapore)	qwen3.6-max-preview	$1.30	$7.80	$0.13
OpenAI	gpt-5.5-pro	$30.00	$180.00	—
Azure	gpt-5.5	$5.00	$30.00	$0.50
OpenAI	gpt-5.5	$5.00	$30.00	$0.50
DeepSeek	deepseek-v4-pro	$0.43	$0.87	$0.00
Alibaba Cloud(singapore)	deepseek-v4-pro	$2.40	$4.80	$0.20
Together AI	deepseek-v4-pro	$1.74	$3.48	$0.20
Alibaba Cloud(cn-beijing)	deepseek-v4-pro	$1.65	$3.30	$0.14
Alibaba Cloud	deepseek-v4-pro	$2.40	$4.80	$0.20
DeepInfra	deepseek-v4-pro	$1.74	$3.48	$0.14
Google AI Studio	gemini-pro-latest	$2.00	$12.00	$0.20
Azure	o4-mini	$1.10	$4.40	$0.28
OpenAI	o4-mini	$1.10	$4.40	$0.28
Azure	gpt-5.4-pro	$30.00	$180.00	—
OpenAI	gpt-5.4-pro	$30.00	$180.00	—
Quartz	gemini-3.1-pro-preview	$2.00	$12.00	$0.20
Google AI Studio	gemini-3.1-pro-preview	$2.00	$12.00	$0.20
Google Vertex AI	gemini-3.1-pro-preview	$2.00	$12.00	$0.20
Azure	gpt-5.2-pro	$21.00	$168.00	—
OpenAI	gpt-5.2-pro	$21.00	$168.00	—
Alibaba Cloud	kimi-k2-thinking	$0.57	$2.29	—
Moonshot AI	kimi-k2-thinking	$0.60	$2.50	$0.15
Alibaba Cloud(cn-beijing)	kimi-k2-thinking	$0.57	$2.29	—
Vertex AI (OpenAI-compatible)	kimi-k2-thinking	$0.60	$2.50	$0.06
ByteDance	kimi-k2-thinking	$0.60	$2.50	$0.12
Nebius AI	qwen3-235b-a22b-thinking-2507	$0.20	$0.60	—

Math is where reasoning models earn their keep: spending thinking tokens before answering dramatically improves accuracy on competition problems, proofs, and multi-step quantitative work. The strongest options are OpenAI's Pro-tier models, Claude Opus, and Gemini Pro — and, at a much lower price, open-weight reasoners like DeepSeek V4, Qwen's thinking models, and Xiaomi's MiMo.

All of them are available through the same API here, so you can tune thinking budgets, compare answers across models, and route easy problems to cheap models while sending the hard ones to a Pro tier.

Frequently asked questions

What is the best LLM for math?

GPT-5.5 Pro and GPT-5.4 Pro top most math evaluations, with Claude Opus 4.8 and Gemini 3.1 Pro close behind. DeepSeek V4 Pro and Qwen's 235B thinking model get remarkably close at a fraction of the cost, which makes them the default choice for high-volume math workloads.

Do I need a reasoning model for math?

For anything beyond arithmetic and simple algebra, yes. Reasoning models work through problems step by step before answering and are far more reliable on competition-style and multi-step problems. Most models here let you cap the thinking budget so you control cost per problem.

Can LLMs be trusted for calculations?

Not blindly. Models still make arithmetic slips inside otherwise-correct reasoning, so for production use pair the model with tool calling — let it call a calculator or run code — and use the LLM for setting up and interpreting the math rather than raw number crunching.

How much do reasoning tokens cost?

Reasoning tokens bill as output tokens, and hard problems can burn thousands of them. That's why per-token price matters double for math: DeepSeek V4 Pro at $0.87 per million output tokens can be orders of magnitude cheaper per problem than a Pro-tier frontier model — compare output prices in the list above.

Support

Welcome!

Best Models for Math

Use Case

Capabilities

Provider

Input Price ($/M tokens)

Output Price ($/M tokens)

Context Size (tokens)

Frequently asked questions

What is the best LLM for math?

Do I need a reasoning model for math?

Can LLMs be trusted for calculations?

How much do reasoning tokens cost?

Stay ahead of the curve