Back to blog

Speech Generation Is Live: ElevenLabs, OpenAI & Gemini TTS Through One API

Nine text-to-speech models from ElevenLabs, OpenAI, and Google are now one OpenAI-compatible API call away — plus a new Audio Studio in the Playground to compare voices and models side by side.

Sound waves flowing out of a single API endpoint into multiple speech models

Adding voice to your app used to mean picking one TTS vendor, learning their SDK, and managing yet another API key and invoice. Today that's one decision lighter: LLM Gateway now supports speech generation through the OpenAI-compatible /v1/audio/speech endpoint — with models from ElevenLabs, OpenAI, and Google Gemini behind the same key, billing, and logs you already use for chat, images, and video.

And if you'd rather hear the voices before writing a line of code, the new Audio Studio in the Playground lets you generate speech from up to three models side by side.

One endpoint, nine models, 60+ voices

The endpoint is a drop-in replacement for OpenAI's audio API. If you've used openai.audio.speech.create(), you already know how it works — point the base URL at LLM Gateway and switch models freely:

1import OpenAI from "openai";2import { writeFileSync } from "fs";3
4const openai = new OpenAI({5  apiKey: process.env.LLM_GATEWAY_API_KEY,6  baseURL: "https://api.llmgateway.io/v1",7});8
9const response = await openai.audio.speech.create({10  model: "eleven-multilingual-v2",11  voice: "Sarah",12  input: "Hello, welcome to LLM Gateway!",13});14
15writeFileSync("speech.mp3", Buffer.from(await response.arrayBuffer()));

Or with curl:

1curl -X POST "https://api.llmgateway.io/v1/audio/speech" \2  -H "Authorization: Bearer $LLM_GATEWAY_API_KEY" \3  -H "Content-Type: application/json" \4  -d '{5    "model": "gemini-2.5-flash-preview-tts",6    "input": "Hello, welcome to LLM Gateway!",7    "voice": "Kore"8  }' \9  --output speech.wav

The lineup

Model Best for Pricing
eleven-multilingual-v2 Most lifelike voices, rich emotion, 29 languages $0.11 / 1K characters
eleven-v3 Most expressive delivery, 70+ languages $0.11 / 1K characters
eleven-flash-v2-5 Ultra-low latency, real-time use, 32 languages $0.055 / 1K characters
eleven-turbo-v2-5 Fast and balanced, 32 languages $0.055 / 1K characters
gpt-4o-mini-tts Steerable delivery via instructions $0.60 / 1M input + $12 / 1M audio tokens
tts-1 Real-time OpenAI TTS $15 / 1M characters
tts-1-hd Higher-quality OpenAI TTS $30 / 1M characters
gemini-2.5-flash-preview-tts Natural, controllable speech at low cost $0.50 / 1M input + $10 / 1M audio tokens
gemini-2.5-pro-preview-tts Highest-quality Gemini speech $1 / 1M input + $20 / 1M audio tokens

Between them you get 60+ prebuilt voices: 20 named ElevenLabs voices (Sarah, Roger, Charlotte, Brian, …), 30 Gemini voices (Kore, Puck, Zephyr, Charon, …), and OpenAI's catalog (alloy, ash, coral, nova, verse, …).

Browse the full list with live pricing on the models page.

Control the delivery

Beyond model, input, and voice, the endpoint accepts:

  • response_formatmp3, wav, opus, aac, flac, or raw pcm, depending on the model family
  • instructions — a style directive like "Speak like a calm narrator" (steerable models such as gpt-4o-mini-tts and the Gemini TTS models shine here)
  • speed — playback speed on OpenAI models

Full parameter reference, format support per family, and billing details are in the speech generation docs.

Audio Studio: hear it before you ship it

Picking a voice from a table is hopeless — you need to listen. The new Audio Studio at chat.llmgateway.io/audio joins the Image and Video studios in the Playground:

  • Compare mode — run the same script through up to 3 models in parallel and listen side by side
  • Per-model controls — voice picker, output format, playback speed, and style instructions adapt to whatever each model supports
  • History — every generation is saved per organization, so you can revisit, rename, or re-run earlier takes
  • One-click download — pull the audio file straight from the player

Type a sentence, pick eleven-v3, gpt-4o-mini-tts, and gemini-2.5-pro-preview-tts, and you'll know within seconds which voice fits your product.

Why route TTS through the gateway?

The same reasons you route chat through it:

  • One API key for every provider — no separate ElevenLabs, OpenAI, and Google accounts to wire up
  • Unified billing — speech usage lands in the same credit balance and cost analytics as everything else, with per-request cost in your logs
  • Bring your own keys if you have negotiated provider rates, or use gateway credits and skip provider signups entirely
  • One integration — when the next great TTS model ships, it's a one-line model string change

One note: streaming speech output isn't supported yet — the endpoint returns the complete audio file in a single response. Low-latency chunked output is on the roadmap.

Start talking