Building a Slack Q&A Bot with LLM Gateway and Chat SDK

A walkthrough of our new open-source template: a Slack bot that streams AI answers, keeps thread context, and searches the web — backed by LLM Gateway so you can switch between 200+ models with one API key.

June 18, 2026

Building a Slack Q&A Bot with LLM Gateway and Chat SDK

Most teams already live in Slack. So when someone has a question — "what's the difference between TCP and UDP?", "summarize this thread for me", "what changed in the latest Next.js release?" — the lowest-friction place to ask it is the channel they're already typing in, not a separate tab.

We built a Slack Q&A bot template for exactly that. Mention it, open its assistant pane, or DM it, and it streams an answer back, remembers the thread, and cites its sources. It's open source, and because it routes through LLM Gateway, you can point it at any of 200+ models with a single API key.

This post is a walkthrough of how it works and the decisions behind it.

Scaffold it in one command

1npx @llmgateway/cli init --template slack-qa-bot

1npx @llmgateway/cli init --template slack-qa-bot

The whole bot is about 200 lines of TypeScript across four files:

1src/2  index.ts            Hono app with HTTP routes3  bot.ts              Chat SDK bot instance and event handlers4  lib/5    ai.ts             LLM Gateway provider + ToolLoopAgent + answer() stream helper6    state.ts          Redis state adapter (subscriptions + locking)7    local.ts          Local dev server entrypoint

1src/2  index.ts            Hono app with HTTP routes3  bot.ts              Chat SDK bot instance and event handlers4  lib/5    ai.ts             LLM Gateway provider + ToolLoopAgent + answer() stream helper6    state.ts          Redis state adapter (subscriptions + locking)7    local.ts          Local dev server entrypoint

The stack: Chat SDK for the Slack plumbing, the AI SDK for the agent and streaming, the LLM Gateway provider for model access, Hono for the HTTP server, and Redis for state.

The webhook: one route, no boilerplate

Slack delivers everything — mentions, DMs, assistant events, interactions — to a single webhook. With Chat SDK, the entire HTTP surface is one Hono handler:

1import { Hono } from "hono";2import { bot } from "./bot.js";3
4const app = new Hono();5
6app.get("/", (c) => c.json({ bot: "qa-bot", status: "ok" }));7
8app.post("/api/webhooks/:platform", (c) => {9  const platform = c.req.param("platform");10  if (platform !== "slack") {11    return c.json({ error: `Unknown platform: ${platform}` }, 404);12  }13  return bot.webhooks.slack(c.req.raw);14});15
16export default app;

1import { Hono } from "hono";2import { bot } from "./bot.js";3
4const app = new Hono();5
6app.get("/", (c) => c.json({ bot: "qa-bot", status: "ok" }));7
8app.post("/api/webhooks/:platform", (c) => {9  const platform = c.req.param("platform");10  if (platform !== "slack") {11    return c.json({ error: `Unknown platform: ${platform}` }, 404);12  }13  return bot.webhooks.slack(c.req.raw);14});15
16export default app;

bot.webhooks.slack handles the parts of Slack integration that are tedious and easy to get subtly wrong: signature verification, the URL verification challenge, event deduplication, and Slack's infamous three-second acknowledgement window. You hand it the raw request and get back a response.

Notice the route is parameterized as :platform. That's deliberate — adding Microsoft Teams or Google Chat later means registering another adapter, not rewriting the server.

The bot: four handlers cover every entry point

The bot itself is a single Chat instance with a Slack adapter and a Redis-backed state store:

1export const bot = new Chat({2  adapters: {3    slack: createSlackAdapter(),4  },5  state,6  userName: "qa-bot",7});

1export const bot = new Chat({2  adapters: {3    slack: createSlackAdapter(),4  },5  state,6  userName: "qa-bot",7});

From there, four event handlers cover every way a user can reach the bot.

Channel mentions subscribe to the thread, then answer:

1bot.onNewMention(async (thread, message) => {2  await thread.subscribe();3  await respond(thread, message);4});

1bot.onNewMention(async (thread, message) => {2  await thread.subscribe();3  await respond(thread, message);4});

That thread.subscribe() call is the key to good conversational UX. After the first mention, the bot keeps answering follow-up messages in that thread without needing to be re-mentioned every time:

1bot.onSubscribedMessage(async (thread, message) => {2  if (UNSUBSCRIBE_PATTERN.test(message.text)) {3    await thread.unsubscribe();4    await thread.post(5      "Got it — I'll stop following this thread. Mention me anytime.",6    );7    return;8  }9  await respond(thread, message);10});

1bot.onSubscribedMessage(async (thread, message) => {2  if (UNSUBSCRIBE_PATTERN.test(message.text)) {3    await thread.unsubscribe();4    await thread.post(5      "Got it — I'll stop following this thread. Mention me anytime.",6    );7    return;8  }9  await respond(thread, message);10});

Reply stop or unsubscribe and the bot leaves. Direct messages are treated as implicit mentions, so they subscribe and answer too. Finally, onAssistantThreadStarted wires up Slack's Assistants API with a few suggested prompts so the assistant pane isn't a blank box.

Context: turn a Slack thread into model messages

An answer is only as good as the context behind it. Before calling the model, the bot pulls recent thread history and converts it into AI SDK messages:

1const buildPrompt = async (thread, message) => {2  try {3    const { messages } = await thread.adapter.fetchMessages(thread.id, {4      limit: HISTORY_LIMIT,5    });6    const history = await toAiMessages(messages, { includeNames: true });7    if (history.length > 0) {8      return history;9    }10  } catch (error) {11    console.error(12      "Failed to fetch thread history; using latest message",13      error,14    );15  }16  return message.text;17};

1const buildPrompt = async (thread, message) => {2  try {3    const { messages } = await thread.adapter.fetchMessages(thread.id, {4      limit: HISTORY_LIMIT,5    });6    const history = await toAiMessages(messages, { includeNames: true });7    if (history.length > 0) {8      return history;9    }10  } catch (error) {11    console.error(12      "Failed to fetch thread history; using latest message",13      error,14    );15  }16  return message.text;17};

toAiMessages does the unglamorous-but-important work of mapping Slack's message shape onto the AI SDK's user/assistant role format. includeNames: true prefixes each message with the speaker ([alice]: ...), so in a busy multi-person thread the model knows who said what. If the history fetch fails for any reason, the bot falls back to the latest message rather than erroring out — a small reliability touch that matters in production.

Streaming: use `fullStream`, not `textStream`

The model lives in ai.ts, built on the AI SDK's ToolLoopAgent:

1export const gateway = createLLMGateway();2export const model = process.env.AI_MODEL ?? "anthropic/claude-sonnet-4-6";3
4export const agent = new ToolLoopAgent({5  instructions: SYSTEM_PROMPT,6  model: gateway(7    modelId,8    webSearchEnabled ? { extraBody: { web_search: true } } : {},9  ),10});11
12export const answer = async (prompt: string | AiMessage[]) => {13  const result = await agent.stream({ prompt });14  return result.fullStream;15};

1export const gateway = createLLMGateway();2export const model = process.env.AI_MODEL ?? "anthropic/claude-sonnet-4-6";3
4export const agent = new ToolLoopAgent({5  instructions: SYSTEM_PROMPT,6  model: gateway(7    modelId,8    webSearchEnabled ? { extraBody: { web_search: true } } : {},9  ),10});11
12export const answer = async (prompt: string | AiMessage[]) => {13  const result = await agent.stream({ prompt });14  return result.fullStream;15};

createLLMGateway() reads LLM_GATEWAY_API_KEY from the environment automatically — there's no key handling in your code. The bot then streams the answer straight into Slack:

1const respond = async (thread, message) => {2  await thread.startTyping();3  try {4    const prompt = await buildPrompt(thread, message);5    await thread.post(await answer(prompt));6  } catch (error) {7    await thread.post(ERROR_MESSAGE);8  }9};

1const respond = async (thread, message) => {2  await thread.startTyping();3  try {4    const prompt = await buildPrompt(thread, message);5    await thread.post(await answer(prompt));6  } catch (error) {7    await thread.post(ERROR_MESSAGE);8  }9};

One detail worth calling out: answer() returns the agent's fullStream, not textStream. The full stream includes step boundaries, which Chat SDK turns into clean paragraph breaks as it posts into Slack. Pipe the text-only stream instead and a multi-step answer arrives as one undifferentiated wall of text. Chat SDK uses Slack's native streaming where it's available and falls back to post-then-edit elsewhere, so the implementation stays the same across platforms.

Web search, served by the gateway

The bot can answer questions about current events and recent releases because LLM Gateway runs web search server-side. You opt in by setting one flag on the request body:

1gateway(modelId, { extraBody: { web_search: true } });

1gateway(modelId, { extraBody: { web_search: true } });

The provider passes extraBody straight through to the gateway, which performs the search and feeds results back to the model — no search API to integrate, no tool to wire up. It's on by default in the template; set WEB_SEARCH=false to turn it off.

There's one sharp edge worth documenting honestly. In streaming mode, the provider doesn't forward url_citation annotations as AI SDK source parts, so result.sources comes back empty. Rather than fight that, the system prompt simply asks the model to cite inline as it writes:

When you rely on web results, cite your sources inline as markdown links, e.g. [Anthropic](https://anthropic.com).

Slack renders those as clickable links, and you get citations without depending on a stream feature that isn't there yet.

State: subscriptions and locking, both in Redis

Two distinct jobs run through the Redis state adapter, and the second one is easy to forget until it bites you:

1export const state = createRedisState();

1export const state = createRedisState();

The first job is thread subscriptions — the set of threads the bot is actively following, which is what makes the no-re-mention follow-ups work. The second is distributed locking. On a serverless platform, Slack's retries and multiple warm instances mean the same webhook event can land twice. The lock guarantees that two instances never process the same event in parallel, so users never get a duplicate answer. createRedisState() reads REDIS_URL and handles both.

Deploy anywhere

Because the app is a standard Hono fetch handler, it deploys to any fetch-compatible runtime — Vercel, Cloudflare Workers, AWS, or your own box. The local.ts entrypoint wraps it with @hono/node-server for local development only:

1import { serve } from "@hono/node-server";2import app from "../index.js";3
4serve({ fetch: app.fetch, port: Number(process.env.PORT) || 3000 });

1import { serve } from "@hono/node-server";2import app from "../index.js";3
4serve({ fetch: app.fetch, port: Number(process.env.PORT) || 3000 });

In development, ngrok http 3000 gives you a public URL to paste into your Slack app's Event Subscriptions, and you're live.

Why route through a gateway

The default model is anthropic/claude-sonnet-4-6, but it's just an environment variable:

1AI_MODEL=openai/gpt-4o2AI_MODEL=google/gemini-2.5-pro3AI_MODEL=anthropic/claude-opus-4-6

1AI_MODEL=openai/gpt-4o2AI_MODEL=google/gemini-2.5-pro3AI_MODEL=anthropic/claude-opus-4-6

That single line of indirection is the whole point. The same LLM_GATEWAY_API_KEY reaches every provider, so swapping models is a config change, not a code change or a new vendor contract. You get one bill, one place to watch spend and latency, and built-in failover if a provider has a bad day. For a bot that a whole team will lean on, being able to chase the best price-performance model without touching the deploy is exactly the kind of leverage a gateway is for.

One bot, many platforms

Chat SDK's adapters mean the same four handlers can answer on Microsoft Teams, Google Chat, Discord, or Telegram. You register another adapter and extend the webhook route — the answering logic doesn't change:

1export const bot = new Chat({2  adapters: {3    slack: createSlackAdapter(),4    teams: createTeamsAdapter(),5    gchat: createGoogleChatAdapter(),6  },7  state,8  userName: "qa-bot",9});

1export const bot = new Chat({2  adapters: {3    slack: createSlackAdapter(),4    teams: createTeamsAdapter(),5    gchat: createGoogleChatAdapter(),6  },7  state,8  userName: "qa-bot",9});

Try it

The template is open source and ships with tests, a Slack manifest for one-click app setup, and a Vercel deploy button.

1npx @llmgateway/cli init --template slack-qa-bot

1npx @llmgateway/cli init --template slack-qa-bot

Grab an LLM Gateway API key, point AI_MODEL at whatever you want to try first, and you'll have a question-answering bot in your workspace in a few minutes. Browse the rest of the templates for more ways to build on LLM Gateway.

Building a Slack Q&A Bot with LLM Gateway and Chat SDK

Scaffold it in one command

The webhook: one route, no boilerplate

The bot: four handlers cover every entry point

Context: turn a Slack thread into model messages

Streaming: use `fullStream`, not `textStream`

Web search, served by the gateway

State: subscriptions and locking, both in Redis

Deploy anywhere

Why route through a gateway

One bot, many platforms

Try it

Stay ahead of the curve

Support

Welcome!

Scaffold it in one command

The webhook: one route, no boilerplate

The bot: four handlers cover every entry point

Context: turn a Slack thread into model messages

Streaming: use fullStream, not textStream

Web search, served by the gateway

State: subscriptions and locking, both in Redis

Deploy anywhere

Why route through a gateway

One bot, many platforms

Try it

Streaming: use `fullStream`, not `textStream`