Embeddings on LLM Gateway: One API for Vectors and Chat

Generate vectors for semantic search, clustering, and RAG through the same gateway you already use for chat. OpenAI-compatible, drop-in, and tracked alongside your model spend.

May 15, 2026

Embeddings on LLM Gateway — turn meaning into vectors

Most teams treat embeddings as a separate problem. Chat traffic goes through one client, one budget, one observability stack. Vectors go through another — a different SDK, a different key, a different bill, often a different provider entirely. Two pipelines for what's fundamentally the same job: turning text into something a model can reason about.

Starting today, that split is gone. LLM Gateway now exposes an OpenAI-compatible /v1/embeddings endpoint. The same base URL and the same API key that handle your chat completions now handle your vectors.

What changed

If you're already using the OpenAI SDK against LLM Gateway, embeddings work with zero code changes:

1import OpenAI from "openai";2
3const client = new OpenAI({4  apiKey: process.env.LLM_GATEWAY_API_KEY,5  baseURL: "https://api.llmgateway.io/v1",6});7
8const response = await client.embeddings.create({9  model: "text-embedding-3-small",10  input: "The quick brown fox jumps over the lazy dog.",11});12
13console.log(response.data[0].embedding);

1import OpenAI from "openai";2
3const client = new OpenAI({4  apiKey: process.env.LLM_GATEWAY_API_KEY,5  baseURL: "https://api.llmgateway.io/v1",6});7
8const response = await client.embeddings.create({9  model: "text-embedding-3-small",10  input: "The quick brown fox jumps over the lazy dog.",11});12
13console.log(response.data[0].embedding);

That's it. The same client object that streams chat completions now returns vectors.

Why this matters

Embeddings are the quiet workhorse behind most production LLM features:

Semantic search — match queries to documents by meaning, not keywords
RAG pipelines — retrieve the right context before you generate
Clustering and deduplication — group similar content at scale
Recommendations — surface "more like this" without hand-tuned rules
Classification — route, tag, or moderate by similarity

When chat and embeddings live in different systems, every one of those features carries hidden tax: two sets of keys to rotate, two bills to reconcile, two dashboards to check, two outages to handle. Consolidating them isn't glamorous — it's just one less thing that can go wrong at 2 a.m.

Billing and observability

Embedding requests show up in the same activity log as your chat traffic, with the same per-request cost breakdown. A few things to know:

Embeddings are billed on input tokens only — there are no output tokens, since the response is a fixed-size vector
Costs roll into the same project budget you already use for chat
The same API key permissions, rate limits, and provider routing rules apply

If you have an LLM Gateway dashboard open right now, embeddings traffic will appear in it without any setup.

Getting started

Pick a model. Browse the embedding-capable models — text-embedding-3-small is a strong default for most use cases.
Point your OpenAI client at LLM Gateway (if it isn't already): baseURL: "https://api.llmgateway.io/v1".
Call embeddings.create(). That's the whole integration.

Full reference and additional examples in the embeddings docs.

One API. One key. One bill. Chat and vectors, finally in the same place.

Embeddings on LLM Gateway: One API for Vectors and Chat

What changed

Why this matters

Billing and observability

Getting started

Stay ahead of the curve

Support

Welcome!