Codestral 22B
Mistral AI · dense · 22B parameters · 32,768 context
Parameters
22B
Context Window
32K tokens
Architecture
Dense
Best GPU
H20
Cheapest API
$0.90/M
Quality Score
63/100
Intelligence Brief
Codestral 22B is a 22B parameter DENSE model from Mistral AI, featuring Grouped Query Attention (GQA) with 56 layers and 6,144 hidden dimensions. With a 32,768 token context window, it supports code, math. On standardized benchmarks, it achieves MMLU 65, HumanEval 58, GSM8K 60. The most cost-effective API deployment is via mistral at $0.90/M output tokens. For self-hosted inference, H20 delivers optimal throughput at $940/month.
Architecture Details
Memory Requirements
BF16 Weights
44.0 GB
FP8 Weights
22.0 GB
INT4 Weights
11.0 GB
GPU Compatibility Matrix
Codestral 22B is compatible with 74% of GPU configurations across 41 GPUs at 3 precision levels.
GPU Recommendations
FP8 · 1 GPU · tensorrt-llm
100/100
score
Throughput
1.1K tok/s
Latency (ITL)
1.0ms
Est. TTFT
0ms
Cost/Month
$940
Cost/M Tokens
$0.34
FP8 · 1 GPU · tensorrt-llm
95/100
score
Throughput
1.1K tok/s
Latency (ITL)
1.0ms
Est. TTFT
0ms
Cost/Month
$1794
Cost/M Tokens
$0.65
FP8 · 1 GPU · tensorrt-llm
95/100
score
Throughput
760.5 tok/s
Latency (ITL)
1.3ms
Est. TTFT
0ms
Cost/Month
$1794
Cost/M Tokens
$0.90
Deployment Options
API Deployment
mistral
$0.90/M
output tokens
Single GPU
H20
$940/mo
Min VRAM: 22 GB
Multi-GPU
A100 40GB SXM x2
375.0 tok/s
TP· $1613/mo
API Pricing Comparison
| Provider | Input $/M | Output $/M | Badges |
|---|---|---|---|
| mistral | $0.30 | $0.90 | Cheapest |
| together | $0.90 | $0.90 |
Cost Analysis
| Provider | Input $/M | Output $/M | ~Monthly Cost |
|---|---|---|---|
| mistralBest Value | $0.30 | $0.90 | $6 |
| together | $0.90 | $0.90 | $9 |
Cost per 1,000 Requests
Short (500 tok)
$0.33
via mistral
Medium (2K tok)
$1.32
via mistral
Long (8K tok)
$4.20
via mistral
Performance Estimates
Throughput by GPU
VRAM Breakdown (H20, FP8)
Precision Impact
bf16
44.0 GB
weights/GPU
fp8
22.0 GB
weights/GPU
~1.1K tok/s
int4
11.0 GB
weights/GPU
Quality Benchmarks
Capabilities
Features
Supported Frameworks
Supported Precisions
Where to Deploy Codestral 22B
Self-Hosted Infrastructure
Similar Models
Codestral Mamba 7B
7.3B params · hybrid
Quality: 50
from $0.60/M
Solar Pro 22B
22B params · dense
Quality: 50
from $0.50/M
Mistral Small 24B
24B params · dense
Quality: 68
from $0.30/M
Mistral Small 3.1 24B
24B params · dense
Quality: 50
from $0.30/M
GigaChat 20B
20B params · dense
Quality: 50
Frequently Asked Questions
How much VRAM does Codestral 22B need for inference?
Codestral 22B requires approximately 44.0 GB of VRAM at BF16 precision, 22.0 GB at FP8, or 11.0 GB at INT4 quantization. Additional VRAM is needed for KV-cache (229376 bytes per token) and activations (~1.50 GB).
What is the best GPU for Codestral 22B?
The top recommended GPU for Codestral 22B is the H20 using FP8 precision. It achieves approximately 1.1K tokens/sec at an estimated cost of $940/month ($0.34/M tokens). Score: 100/100.
How much does Codestral 22B inference cost?
Codestral 22B API inference starts from $0.30/M input tokens and $0.90/M output tokens. Self-hosted inference costs depend on your GPU configuration — use our ROI calculator for a detailed breakdown.