Back to guides

Kimi Code Integration

Use GPT-5, Claude, Gemini, or any model with Kimi Code CLI. Custom provider configuration, full cost tracking.

Kimi Code CLI is an open-source, AI-powered coding agent developed by Moonshot AI designed to automate software development tasks directly within your terminal. It can read and edit code, execute shell commands, search files, and autonomously manage complex coding workflows.

By configuring Kimi Code CLI to use LLM Gateway, you can point it at any model—GPT-5, Gemini, Llama, Claude, or 210+ others—while keeping the same API formats Kimi Code expects, with full cost tracking in your dashboard.

Prerequisites

  • An LLM Gateway API key — sign up free (no credit card required)

Setup

Step 1: Install Kimi Code CLI

If you haven't already, install Kimi Code CLI.

  • macOS or Linux:

    1curl -fsSL https://code.kimi.com/kimi-code/install.sh | bash
  • Homebrew (macOS/Linux):

    1brew install kimi-code
  • Windows (PowerShell):

    1irm https://code.kimi.com/kimi-code/install.ps1 | iex

Confirm the installation:

1kimi --version

Step 2: Configure config.toml

Create or edit your Kimi Code configuration file at ~/.kimi-code/config.toml (on Windows, this is typically under C:\Users<YourUsername>.kimi-code\config.toml).

Add the llmgateway provider and define the models you want to use. Here is an example configuration that sets up GPT-5.5, Claude Opus 4.6, DeepSeek V4 Pro, MiniMax M3, and Qwen3.7 Max:

1default_model = "llmgateway/gpt-5.5"2
3[providers.llmgateway]4type = "openai"5api_key = "llmgtwy_your_api_key_here"6base_url = "https://api.llmgateway.io/v1"7
8[models."llmgateway/gpt-5.5"]9provider = "llmgateway"10model = "gpt-5.5"11max_context_size = 105000012max_output_size = 12800013capabilities = [ "image_in", "thinking", "tool_use" ]14display_name = "GPT-5.5"15
16[models."llmgateway/claude-opus-4-6"]17provider = "llmgateway"18model = "claude-opus-4-6"19max_context_size = 100000020max_output_size = 12800021capabilities = [ "image_in", "thinking", "tool_use" ]22display_name = "Claude Opus 4.6"23
24[models."llmgateway/deepseek-v4-pro"]25provider = "llmgateway"26model = "deepseek-v4-pro"27max_context_size = 105000028max_output_size = 39321629capabilities = [ "thinking", "tool_use" ]30display_name = "DeepSeek V4 Pro"31
32[models."llmgateway/minimax-m3"]33provider = "llmgateway"34model = "minimax-m3"35max_context_size = 104857636max_output_size = 13107237capabilities = [ "image_in", "thinking", "tool_use" ]38display_name = "MiniMax M3"39
40[models."llmgateway/qwen3.7-max"]41provider = "llmgateway"42model = "qwen3.7-max"43max_context_size = 100000044max_output_size = 6553645capabilities = [ "thinking", "tool_use" ]46display_name = "Qwen3.7 Max"

Configuring config.toml

Replace llmgtwy_your_api_key_here with your actual LLM Gateway API key from the dashboard.

Step 3: Run Kimi Code CLI

Navigate to your project folder and launch the interactive terminal:

1kimi

All requests will now be routed through LLM Gateway, allowing you to use advanced models for local autonomous coding while showing real-time usage and cost statistics on your LLM Gateway dashboard.

Running Kimi Code with LLM Gateway

Configuration Details

The Providers Section

To connect to LLM Gateway, define a custom provider with type = "openai" and specify the base URL pointing to the LLM Gateway endpoint.

1[providers.llmgateway]2type = "openai"3api_key = "llmgtwy_your_api_key_here"4base_url = "https://api.llmgateway.io/v1"

Defining Custom Models

For each model you want to access, add a [models."<provider_name>/<model_identifier>"] block:

  • provider: Must match the provider key under [providers.<key>] (e.g. llmgateway).
  • model: The exact model ID from the LLM Gateway catalog.
  • capabilities: An array containing capabilities the model supports, such as "image_in", "thinking", and "tool_use".
  • maxcontextsize: The maximum context window of the model.

Why Use LLM Gateway with Kimi Code CLI?

  • 210+ models — Access GPT-5, Gemini, Llama, DeepSeek, and more in a single CLI configuration.
  • Unified cost tracking — Get a detailed breakdown of costs per prompt and session in your dashboard.
  • Response caching — Automatically cache repeated requests (such as parsing or building commands) to save API costs.
  • Automatic fallback — Keep coding even if a provider encounters temporary downtime.
  • Volume discounts — Access selected models with up to 90% savings compared to standard pricing.