Back to blog

Q2 2026: Speech, Embeddings & Coding Plans

Three months of updates: speech generation and Audio Studio, OpenAI-compatible embeddings, OCR, DevPass coding plans, chat subscription plans, enterprise IAM and master keys, SOC 2 Type II, 40+ new models, and much more.

Q2 2026 Feature Roundup

Most teams want one API for everything they build with — text, speech, embeddings, images, video — without juggling a separate vendor and SDK for each. Q2 was about closing that gap. LLM Gateway added speech generation and embeddings as first-class OpenAI-compatible endpoints, shipped fixed-price coding plans and chat subscriptions, tightened enterprise access controls, earned SOC 2 Type II, and brought 40+ new models online. Here's everything that shipped from April through June.

By the Numbers

The quarter in traffic, across every project on the platform:

  • 27,038,440 requests routed
  • 207.9B tokens processed — 195.8B input, 120B cached, 11.8B output

DeepSeek V4, Grok 4.1 Fast, and Gemini 3 Flash drove the most volume. But the bigger story is how much of it is no longer plain text: Gemini Embedding 2 ranks second by request count, and GPT Image 2 lands in the top ten — proof that embeddings, images, and audio now move serious traffic through the gateway.

Top 10 models by tokens

# Model Tokens
1 deepseek-v4-pro 48.6B
2 grok-4-1-fast-non-reasoning 29.2B
3 gemini-3-flash-preview 27.0B
4 deepseek-v4-flash 23.0B
5 claude-sonnet-4-6 7.7B
6 claude-opus-4-8 6.4B
7 gemini-3-pro-image-preview 6.2B
8 gemini-embedding-2 5.8B
9 glm-5.2 5.1B
10 claude-opus-4-6 4.6B

Top 10 models by requests

# Model Requests
1 grok-4-1-fast-non-reasoning 10,605,050
2 gemini-embedding-2 5,887,455
3 gemini-3-flash-preview 2,406,179
4 gemini-3-pro-image-preview 2,122,872
5 deepseek-v4-flash 1,158,043
6 gemini-3.1-flash-image-preview 668,547
7 gpt-image-2 543,958
8 deepseek-v4-pro 540,795
9 grok-4-1-fast 300,313
10 deepseek-v3.2 289,820

Speech & Audio

Audio Studio in the LLM Gateway Playground with multi-track speech generation

Text-to-speech now runs through the same gateway as the rest of your stack:

  • /v1/audio/speech — An OpenAI-compatible speech endpoint backed by ElevenLabs, Google Gemini, and more, so you switch voices and providers without changing your code.
  • ElevenLabs provider — Native text-to-speech with per-character pricing tracked on every request.
  • Google audio — Audio support for Google models, wired into the Playground.
  • Audio Studio — A dedicated workspace in the Playground for generating and previewing speech.

Read the speech docs

Embeddings

OpenAI-compatible embeddings turning text into vectors for semantic search and RAG

/v1/embeddings is now OpenAI-compatible across providers, so retrieval and semantic search work without provider-specific glue:

  • Google embeddings via gemini-embedding-001, plus Google Vertex embeddings
  • Same-provider key fallback — Embedding requests fail over to your other keys on the same provider
  • Routing metadata and key health included in embedding responses, just like chat completions

Read the embeddings docs

OCR

  • /v1/ocr — Extract structured text from documents and images with mistral-ocr-latest.
  • Chat OCR — The Playground reads text out of uploaded images directly in a conversation.

Read the OCR docs

Video Generation

ByteDance Seedance video generation models in the LLM Gateway model selector

We expanded video beyond Q1's launch with new models and input modes:

  • ByteDance Seedance 2.0, 2.0 Fast, and 1.5 Pro — including reference-video input and first/last-frame control on Seedance 2.0
  • Alibaba Wan 2.6 — text-to-video
  • MiniMax Hailuo 2.3
  • AtlasCloud Kling v3
  • Grok Imagine Video 1.5 — promoted out of preview

Image Generation

Image Studio in the LLM Gateway Playground generating images with GPT Image 2

  • gpt-image-2 — Added from OpenAI and via Azure OpenAI, with quality and size pass-through for accurate, resolution-based pricing.
  • Reve — A new image-generation provider.

Responses API

  • /v1/responses — Full support for OpenAI's Responses API.
  • /v1/responses/compact — A compact variant for smaller payloads.
  • item_reference resolution — Input items referenced by ID are resolved server-side.

DevPass: Coding Plans

DevPass gives you a fixed monthly price for coding agents like Claude Code, Codex, Cline, and Cursor — frontier models without metered per-token billing. Q2 turned it into a complete product:

  • Restricted to coding agents and root-model routing — Plans cover inference for supported agents, keeping pricing predictable
  • Annual billing alongside monthly
  • Invoices and shared billing details across the dashboard
  • Public DevPass profiles to show off what you've built
  • Social and passkey sign-in
  • Cancellation feedback flow and lifecycle notifications
  • New integration guides for Pi, Continue, Hermes, and Cursor plan mode

Get your DevPass

Chat Subscription Plans & Playground

Chat subscription plans, service tiers, and the SDK sandbox

The chat Playground gained Starter, Plus, and Pro subscription plans plus a wave of workflow features:

  • Forking, message editing, and chat reset — Branch a conversation or rewind it
  • Temporary chats that leave no history, and pinned chats in the sidebar
  • Public share links with a redesigned share dialog and OpenGraph images
  • Chat history search across every conversation
  • Comparison mode persistence — Your multi-model setup sticks between sessions
  • AI chat support replacing Crisp, with suggested answers and one-click human escalation
  • Image, Video, and Audio Studios plus a Canvas page for longer-form work

Open the Playground

Routing & Reliability

Routing got smarter about cost, latency, and stickiness:

  • Per-request and per-project routing strategy — Choose how the gateway picks providers at either level
  • Sticky session routing via the x-session-id header, so a conversation stays on one provider
  • Stable preferred-provider routing for predictable provider selection
  • Image-aware token estimates feed auto-routing for more accurate cost weighting
  • Provider service tiers — Flex and priority tiers (including Vertex), gated to your own provider keys
  • Faster provider-downtime reaction and AWS Bedrock region routing with a global default

Read the routing docs

Enterprise & Security

Integrated guardrails and custom rules for enterprise safety policies

  • SOC 2 Type II — LLM Gateway completed its SOC 2 Type II audit. Read the announcement
  • IAM rules — Restrict API keys by IP CIDR range (Enterprise)
  • Master keys — Provision and manage keys programmatically, with dedicated IAM rule routes
  • Per-key custom model catalog — Enterprise organizations expose a curated model list per key
  • Per-project routing overrides — Pin providers and policies at the project level
  • Provider compliance policies and legal metadata surfaced per provider
  • Guardrails redact action — Mask sensitive content instead of blocking it outright
  • Enterprise trial and lifted seat, project, and key limits for enterprise plans

Explore Enterprise

API Key Lifecycle

  • TTL expiration — Set an expiry on any API key
  • Roll secret — Rotate a key's secret without changing its ID or breaking integrations
  • See our API key rotation guide for the full pattern

Embeddable Payments SDK

For platforms that resell or meter LLM usage to their own users:

  • Embeddable end-user wallets — Give your users their own credit balances
  • SDK settings and sandbox test keys for safe local development
  • Opt-in preview behind a feature flag

Read the payments SDK docs

New Models

Q2 added more than 40 models across providers:

Anthropic

  • Claude Opus 4.8 (Anthropic and AWS Bedrock)
  • Claude Opus 4.7 with adaptive thinking
  • Adaptive thinking for Opus 4.6
  • Claude Sonnet 4.6 with a 1M-token context window
  • Claude Fable 5 (Anthropic and AWS Bedrock)

OpenAI

  • GPT-5.5 family
  • gpt-image-2 (OpenAI and Azure OpenAI)

xAI

  • Grok 4.3 and Grok Build 0.1, plus grok-4.20 via Vertex AI
  • Grok Imagine Video 1.5

DeepSeek

  • DeepSeek V4 Pro and V4 Flash across Alibaba, Novita, and CanopyWave
  • DeepSeek V4 in Alibaba's Singapore region
  • Reasoning enabled for DeepSeek V3.2 on Novita

Open & Frontier Models

  • GLM-5.1 and GLM-5.2 across Z.ai, EmberCloud, Together AI, and Novita
  • Kimi K2.6, K2.7 Highspeed, and K2.7 Code across Moonshot, CanopyWave, Novita, and Together AI
  • MiniMax M3 and tool calling on MiniMax M2.7
  • Qwen3.6 (Max Preview, Plus, 35B-A3B) and Qwen3.7 (Max, Plus) across Alibaba and Novita
  • Gemma 4, Gemini 3.5 Flash, and Gemini 3.1 Flash Lite
  • Nemotron 3 Ultra 550B, Xiaomi MiMo, and Sakana fugu-ultra

Explore all models

New Providers

  • ElevenLabs — Text-to-speech
  • Reve — Image generation
  • DeepInfra — Inference provider
  • Bluestone and extended Together AI coverage
  • vertex-anthropic and a discounted anthropic-discount provider
  • Azure AI Foundry — Grok models, gpt-oss-120b, and custom Foundry deployment names
  • Vertex AI partner models — 13 new OpenAI-compatible mappings

Browse all providers

Analytics & Admin

Usage analytics and cost breakdown in the LLM Gateway dashboard

  • Organization analytics — Member, API-key, and per-source usage breakdowns
  • Hourly history rollups — Faster long-range charts, with hourly buckets beyond 24 hours in the admin dashboard
  • Model categorization and weekly fair-use caps for premium-tier models
  • A steady stream of admin dashboard improvements: cost-share views, sortable provider tables, error breakdowns by source, and custom date ranges

Billing & Payments

  • Cache-write billing for Anthropic, AWS Bedrock, and Alibaba
  • International payment fee handling
  • Credit top-up minimum raised to $10

Deployment

  • Helm chart — Self-host LLM Gateway on Kubernetes with a maintained chart. See the self-hosting guide.

Docs, SEO & Comparisons

  • Fumadocs upgrade with Ask AI — Ask questions against the docs in natural language
  • Enriched llms.txt and a sitemap page for AI crawlers
  • Refreshed comparison pages with provider logos and OpenGraph images
  • Community model ratings — Rate any model after 100 requests
  • New enterprise SEO pages and a growing library of guides

Explore all models | Try the Playground | Get started now