DBRX Base
Databricks · moe · 132B parameters · 32,768 context
Parameters
132B
Context Window
32K tokens
Architecture
MoE
Best GPU
B200 SXM
Cheapest API
$2.25/M
Intelligence Brief
DBRX Base is a 132B parameter Mixture-of-Experts (16 experts, 4 active) model from Databricks, featuring Grouped Query Attention (GQA) with 40 layers and 6,144 hidden dimensions. With a 32,768 token context window, it supports code, math. The most cost-effective API deployment is via databricks at $2.25/M output tokens. For self-hosted inference, B200 SXM delivers optimal throughput at $4261/month.
Architecture Details
Memory Requirements
BF16 Weights
264.0 GB
FP8 Weights
132.0 GB
INT4 Weights
66.0 GB
GPU Compatibility Matrix
DBRX Base is compatible with 20% of GPU configurations across 41 GPUs at 3 precision levels.
GPU Recommendations
FP8 · 1 GPU · tensorrt-llm
100/100
score
Throughput
280.0 tok/s
Latency (ITL)
3.6ms
Est. TTFT
1ms
Cost/Month
$4261
Cost/M Tokens
$5.79
FP8 · 1 GPU · tensorrt-llm
100/100
score
Throughput
280.0 tok/s
Latency (ITL)
3.6ms
Est. TTFT
1ms
Cost/Month
$4271
Cost/M Tokens
$5.80
FP8 · 1 GPU · tensorrt-llm
100/100
score
Throughput
280.0 tok/s
Latency (ITL)
3.6ms
Est. TTFT
1ms
Cost/Month
$6169
Cost/M Tokens
$8.38
Deployment Options
API Deployment
databricks
$2.25/M
output tokens
Single GPU
B200 SXM
$4261/mo
Min VRAM: 132 GB
Multi-GPU
H200 SXM x2
280.0 tok/s
TP· $5106/mo
API Pricing Comparison
| Provider | Input $/M | Output $/M | Badges |
|---|---|---|---|
| databricks | $0.75 | $2.25 | Cheapest |
Cost Analysis
| Provider | Input $/M | Output $/M | ~Monthly Cost |
|---|---|---|---|
| databricksBest Value | $0.75 | $2.25 | $15 |
Cost per 1,000 Requests
Short (500 tok)
$0.82
via databricks
Medium (2K tok)
$3.30
via databricks
Long (8K tok)
$10.50
via databricks
Performance Estimates
Throughput by GPU
VRAM Breakdown (B200 SXM, FP8)
Precision Impact
bf16
264.0 GB
weights/GPU
fp8
132.0 GB
weights/GPU
~280.0 tok/s
int4
66.0 GB
weights/GPU
Capabilities
Features
Supported Frameworks
Supported Precisions
Where to Deploy DBRX Base
Self-Hosted Infrastructure
Similar Models
DBRX Instruct
132B params · moe
Quality: 50
from $1.20/M
Mistral Large 2411
123B params · dense
Quality: 75
from $6.00/M
Mistral Large 2
123B params · dense
Quality: 75
from $2.50/M
Mixtral 8x22B
141B params · moe
Quality: 65
from $1.20/M
Nemotron-3 Super 120B
120B params · dense
Quality: 84
from $2.40/M
Frequently Asked Questions
How much VRAM does DBRX Base need for inference?
DBRX Base requires approximately 264.0 GB of VRAM at BF16 precision, 132.0 GB at FP8, or 66.0 GB at INT4 quantization. Additional VRAM is needed for KV-cache (163840 bytes per token) and activations (~2.00 GB).
What is the best GPU for DBRX Base?
The top recommended GPU for DBRX Base is the B200 SXM using FP8 precision. It achieves approximately 280.0 tokens/sec at an estimated cost of $4261/month ($5.79/M tokens). Score: 100/100.
How much does DBRX Base inference cost?
DBRX Base API inference starts from $0.75/M input tokens and $2.25/M output tokens. Self-hosted inference costs depend on your GPU configuration — use our ROI calculator for a detailed breakdown.