Llama 3.1 8B Instruct

Compact Llama 3.1 for efficient text generation.

llama-3.1-8b-instruct

STABLEModel DeactivatedGet Started View uptime

128,000 context

Starting at $0.02/M input tokens

Starting at $0.05/M output tokens

Streaming

Tools

JSON Output

No ratings yetSign in to rate

Select Provider

All Providers for Llama 3.1 8B Instruct

LLM Gateway routes requests to the best providers that are able to handle your prompt size and parameters.

AWS Bedrock

Context: 128k

Deactivated since Apr 25, 2026

Input

$0.22

/M tokens

Cached

—

/M tokens

Output

$0.22

/M tokens

Get Started

Nebius AI

Context: 128k

Deactivated since Apr 25, 2026

Input

$0.02

/M tokens

Cached

—

/M tokens

Output

$0.06

/M tokens

Get Started

Inference.net

Context: 128k

Deactivated since Apr 25, 2026

Input

$0.07

/M tokens

Cached

—

/M tokens

Output

$0.33

/M tokens

Get Started

Together AI

Context: 128k

Deactivated since Mar 27, 2026

Input

$0.06

/M tokens

Cached

—

/M tokens

Output

$0.06

/M tokens

Get Started

Cerebras

Context: 128k

Deactivated since Apr 25, 2026

Input

$0.1

/M tokens

Cached

—

/M tokens

Output

$0.1

/M tokens

Get Started

NovitaAI

Context: 16.4k

Deactivated since Apr 25, 2026

Input

$0.02

/M tokens

Cached

—

/M tokens

Output

$0.05

/M tokens

Get Started

Llama 3.1 8B Instruct

Select Provider

All Providers for Llama 3.1 8B Instruct

Stay ahead of the curve

Support

Welcome!