Speech Generation + Audio Studio
Text-to-speech is live: nine models from ElevenLabs, OpenAI, and Gemini behind the OpenAI-compatible /v1/audio/speech endpoint, plus a new Audio Studio in the Playground to compare voices side by side.

LLM Gateway can now talk. Text-to-speech ships today with nine models across three providers — all behind the same API key, billing, and logs as your chat, image, and video requests.
The /v1/audio/speech Endpoint
A drop-in replacement for OpenAI's audio API — point your existing OpenAI client at the gateway and you're done:
1curl -X POST "https://api.llmgateway.io/v1/audio/speech" \2 -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \3 -H "Content-Type: application/json" \4 -d '{5 "model": "eleven-multilingual-v2",6 "input": "Hello, welcome to LLM Gateway!",7 "voice": "Sarah"8 }' \9 --output speech.mp31curl -X POST "https://api.llmgateway.io/v1/audio/speech" \2 -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \3 -H "Content-Type: application/json" \4 -d '{5 "model": "eleven-multilingual-v2",6 "input": "Hello, welcome to LLM Gateway!",7 "voice": "Sarah"8 }' \9 --output speech.mp3Supports voice, response_format (mp3, wav, opus, aac, flac, pcm — varies by family), style instructions, and speed. See the speech generation docs for the full reference.
ElevenLabs Joins the Gateway
A brand-new provider with four models and 20 named voices:
eleven-multilingual-v2— most lifelike, 29 languages · $0.11 / 1K characterseleven-v3— most expressive, 70+ languages · $0.11 / 1K characterseleven-flash-v2-5— ultra-low latency, 32 languages · $0.055 / 1K characterseleven-turbo-v2-5— fast and balanced · $0.055 / 1K characters
They join OpenAI's tts-1, tts-1-hd, and steerable gpt-4o-mini-tts, plus gemini-2.5-flash-preview-tts and gemini-2.5-pro-preview-tts — 60+ prebuilt voices in total. Browse them all on the models page.
Audio Studio
The Playground gets a third studio at /audio, joining Image and Video:
- Compare mode — generate the same script with up to 3 models in parallel and listen side by side
- Per-model controls — voice, output format, playback speed, and style instructions adapt to each model family
- Org-scoped history — every generation is saved; revisit, rename, or delete past takes
- One-click download straight from the player