Baichuan 2 13B
Baichuan · dense · 13B parameters · 4,096 context
Parameters
13B
Context Window
4K tokens
Architecture
Dense
Best GPU
A100 40GB SXM
Cheapest API
$0.25/M
Intelligence Brief
Baichuan 2 13B is a 13B parameter DENSE model from Baichuan, featuring Multi-Head Attention (MHA) with 40 layers and 5,120 hidden dimensions. With a 4,096 token context window, it supports code, math, multilingual. The most cost-effective API deployment is via baichuan at $0.25/M output tokens. For self-hosted inference, A100 40GB SXM delivers optimal throughput at $807/month.
Architecture Details
Memory Requirements
BF16 Weights
26.0 GB
FP8 Weights
13.0 GB
INT4 Weights
6.5 GB
GPU Compatibility Matrix
Baichuan 2 13B is compatible with 82% of GPU configurations across 41 GPUs at 3 precision levels.
GPU Recommendations
BF16 · 1 GPU · vllm
95/100
score
Throughput
322.9 tok/s
Latency (ITL)
3.1ms
Est. TTFT
1ms
Cost/Month
$807
Cost/M Tokens
$0.95
BF16 · 1 GPU · vllm
95/100
score
Throughput
159.5 tok/s
Latency (ITL)
6.3ms
Est. TTFT
1ms
Cost/Month
$465
Cost/M Tokens
$1.11
BF16 · 1 GPU · vllm
95/100
score
Throughput
144.5 tok/s
Latency (ITL)
6.9ms
Est. TTFT
1ms
Cost/Month
$399
Cost/M Tokens
$1.05
Deployment Options
API Deployment
baichuan
$0.25/M
output tokens
Single GPU
A100 40GB SXM
$807/mo
Min VRAM: 13 GB
Multi-GPU
RTX 3090 x2
319.5 tok/s
TP· $361/mo
API Pricing Comparison
| Provider | Input $/M | Output $/M | Badges |
|---|---|---|---|
| baichuan | $0.25 | $0.25 | Cheapest |
Cost Analysis
| Provider | Input $/M | Output $/M | ~Monthly Cost |
|---|---|---|---|
| baichuanBest Value | $0.25 | $0.25 | $3 |
Cost per 1,000 Requests
Short (500 tok)
$0.17
via baichuan
Medium (2K tok)
$0.70
via baichuan
Long (8K tok)
$2.50
via baichuan
Performance Estimates
Throughput by GPU
VRAM Breakdown (A100 40GB SXM, BF16)
Precision Impact
bf16
26.0 GB
weights/GPU
~322.9 tok/s
fp8
13.0 GB
weights/GPU
int4
6.5 GB
weights/GPU
Capabilities
Features
Supported Frameworks
Supported Precisions
Where to Deploy Baichuan 2 13B
Similar Models
Baichuan 2 7B
7B params · dense
Quality: 50
OLMo 2 13B
13B params · dense
Quality: 50
Vicuna 13B
13B params · dense
Quality: 50
Code Llama 13B
13B params · dense
Quality: 44
from $0.22/M
Llama 2 13B
13B params · dense
Quality: 47
Frequently Asked Questions
How much VRAM does Baichuan 2 13B need for inference?
Baichuan 2 13B requires approximately 26.0 GB of VRAM at BF16 precision, 13.0 GB at FP8, or 6.5 GB at INT4 quantization. Additional VRAM is needed for KV-cache (819200 bytes per token) and activations (~1.50 GB).
What is the best GPU for Baichuan 2 13B?
The top recommended GPU for Baichuan 2 13B is the A100 40GB SXM using BF16 precision. It achieves approximately 322.9 tokens/sec at an estimated cost of $807/month ($0.95/M tokens). Score: 95/100.
How much does Baichuan 2 13B inference cost?
Baichuan 2 13B API inference starts from $0.25/M input tokens and $0.25/M output tokens. Self-hosted inference costs depend on your GPU configuration — use our ROI calculator for a detailed breakdown.