DBRX Instruct
Databricks · moe · 132B parameters · 32,768 context
Parameters
132B
Context Window
32K tokens
Architecture
MoE
Best GPU
B200 SXM
Cheapest API
$1.20/M
Intelligence Brief
DBRX Instruct is a 132B parameter Mixture-of-Experts (16 experts, 4 active) model from Databricks, featuring Grouped Query Attention (GQA) with 40 layers and 6,144 hidden dimensions. With a 32,768 token context window, it supports tools, structured output, code, math. The most cost-effective API deployment is via together at $1.20/M output tokens. For self-hosted inference, B200 SXM delivers optimal throughput at $4261/month.
Architecture Details
Memory Requirements
BF16 Weights
264.0 GB
FP8 Weights
132.0 GB
INT4 Weights
66.0 GB
GPU Compatibility Matrix
DBRX Instruct is compatible with 20% of GPU configurations across 41 GPUs at 3 precision levels.
GPU Recommendations
FP8 · 1 GPU · tensorrt-llm
100/100
score
Throughput
280.0 tok/s
Latency (ITL)
3.6ms
Est. TTFT
1ms
Cost/Month
$4261
Cost/M Tokens
$5.79
FP8 · 1 GPU · tensorrt-llm
100/100
score
Throughput
280.0 tok/s
Latency (ITL)
3.6ms
Est. TTFT
1ms
Cost/Month
$4271
Cost/M Tokens
$5.80
FP8 · 1 GPU · tensorrt-llm
100/100
score
Throughput
280.0 tok/s
Latency (ITL)
3.6ms
Est. TTFT
1ms
Cost/Month
$6169
Cost/M Tokens
$8.38
Deployment Options
API Deployment
together
$1.20/M
output tokens
Single GPU
B200 SXM
$4261/mo
Min VRAM: 132 GB
Multi-GPU
H200 SXM x2
280.0 tok/s
TP· $5106/mo
API Pricing Comparison
| Provider | Input $/M | Output $/M | Badges |
|---|---|---|---|
| together | $1.20 | $1.20 | Cheapest |
| databricks | $0.75 | $2.25 | Low Input |
Cost Analysis
| Provider | Input $/M | Output $/M | ~Monthly Cost |
|---|---|---|---|
| togetherBest Value | $1.20 | $1.20 | $12 |
| databricks | $0.75 | $2.25 | $15 |
Cost per 1,000 Requests
Short (500 tok)
$0.84
via together
Medium (2K tok)
$3.36
via together
Long (8K tok)
$12.00
via together
Performance Estimates
Throughput by GPU
VRAM Breakdown (B200 SXM, FP8)
Precision Impact
bf16
264.0 GB
weights/GPU
fp8
132.0 GB
weights/GPU
~280.0 tok/s
int4
66.0 GB
weights/GPU
Capabilities
Features
Supported Frameworks
Supported Precisions
Where to Deploy DBRX Instruct
Self-Hosted Infrastructure
Similar Models
DBRX Base
132B params · moe
Quality: 50
from $2.25/M
Mistral Large 2411
123B params · dense
Quality: 75
from $6.00/M
Mistral Large 2
123B params · dense
Quality: 75
from $2.50/M
Mixtral 8x22B
141B params · moe
Quality: 65
from $1.20/M
Nemotron-3 Super 120B
120B params · dense
Quality: 84
from $2.40/M
Frequently Asked Questions
How much VRAM does DBRX Instruct need for inference?
DBRX Instruct requires approximately 264.0 GB of VRAM at BF16 precision, 132.0 GB at FP8, or 66.0 GB at INT4 quantization. Additional VRAM is needed for KV-cache (163840 bytes per token) and activations (~2.00 GB).
What is the best GPU for DBRX Instruct?
The top recommended GPU for DBRX Instruct is the B200 SXM using FP8 precision. It achieves approximately 280.0 tokens/sec at an estimated cost of $4261/month ($5.79/M tokens). Score: 100/100.
How much does DBRX Instruct inference cost?
DBRX Instruct API inference starts from $1.20/M input tokens and $1.20/M output tokens. Self-hosted inference costs depend on your GPU configuration — use our ROI calculator for a detailed breakdown.