Best Models for Translation
Multilingual models with strong translation quality across major and low-resource languages — compared by price and context
| Features | |||||
|---|---|---|---|---|---|
Google AI Studio | $0.30 | $2.50 | $0.03 | ||
Google Vertex AI | $0.30 | $2.50 | $0.03 |
Modern LLMs now rival dedicated translation engines for most language pairs — and beat them on context awareness, tone, terminology consistency, and formatting. The strongest multilingual models are Google's Gemini line, OpenAI's GPT-5.4, Anthropic's Claude, and Alibaba's Qwen, which is particularly strong on Chinese and other Asian languages.
Long context windows also change how translation work gets done: instead of translating strings in isolation, you can put an entire document plus a glossary into one prompt and keep terminology consistent throughout. For bulk workloads, budget models like Gemini Flash-Lite and DeepSeek V4 Flash bring the cost per translated word down to fractions of a cent.
Frequently asked questions
What is the best LLM for translation?
Gemini 3.1 Pro and GPT-5.4 deliver the most consistent quality across a broad set of language pairs. Qwen3.7 Max is a top pick for Chinese, Japanese, and Korean, and Claude Sonnet 5 excels when tone and nuance matter. For bulk work, Gemini 2.5 Flash-Lite and DeepSeek V4 Flash offer the best cost per word.
Are LLMs better than Google Translate or DeepL?
For most content, yes — LLMs follow style guides, preserve formatting and placeholders, keep terminology consistent across a document, and adapt register on request. Dedicated engines still win on raw speed and per-character price for very simple, high-volume strings.
How do I translate long documents?
Use a long-context model and send the whole document in one call: a million-token window fits roughly 750,000 words, and single-call translation keeps names and terminology consistent. If a document exceeds the window, chunk it and include a running glossary in each prompt.
Which models handle low-resource languages best?
Coverage drops for languages with little training data. Gemini Pro and GPT-5.4 generally hold up best, but always test with your actual language pair before committing volume — with one API key you can run the same text through several models in minutes and compare.