Mistral Large 2411
Mistral AI · dense · 123B parameters · 131,072 context
Parameters
123B
Context Window
128K tokens
Architecture
Dense
Best GPU
B200 SXM
Cheapest API
$6.00/M
Quality Score
75/100
Intelligence Brief
Mistral Large 2411 is a 123B parameter DENSE model from Mistral AI, featuring Grouped Query Attention (GQA) with 88 layers and 12,288 hidden dimensions. With a 131,072 token context window, it supports tools, structured output, code, math, multilingual, reasoning. On standardized benchmarks, it achieves MMLU 84, HumanEval 53, GSM8K 91.2. The most cost-effective API deployment is via mistral at $6.00/M output tokens. For self-hosted inference, B200 SXM delivers optimal throughput at $4261/month.
Architecture Details
Memory Requirements
BF16 Weights
246.0 GB
FP8 Weights
123.0 GB
INT4 Weights
61.5 GB
GPU Compatibility Matrix
Mistral Large 2411 is compatible with 21% of GPU configurations across 41 GPUs at 3 precision levels.
GPU Recommendations
FP8 · 1 GPU · tensorrt-llm
100/100
score
Throughput
280.0 tok/s
Latency (ITL)
3.6ms
Est. TTFT
1ms
Cost/Month
$4261
Cost/M Tokens
$5.79
FP8 · 1 GPU · tensorrt-llm
100/100
score
Throughput
280.0 tok/s
Latency (ITL)
3.6ms
Est. TTFT
1ms
Cost/Month
$4271
Cost/M Tokens
$5.80
FP8 · 1 GPU · tensorrt-llm
100/100
score
Throughput
280.0 tok/s
Latency (ITL)
3.6ms
Est. TTFT
1ms
Cost/Month
$6169
Cost/M Tokens
$8.38
Deployment Options
API Deployment
mistral
$6.00/M
output tokens
Single GPU
B200 SXM
$4261/mo
Min VRAM: 123 GB
Multi-GPU
H20 x2
280.0 tok/s
TP· $1879/mo
API Pricing Comparison
| Provider | Input $/M | Output $/M | Badges |
|---|---|---|---|
| mistral | $2.00 | $6.00 | Cheapest |
Cost Analysis
| Provider | Input $/M | Output $/M | ~Monthly Cost |
|---|---|---|---|
| mistralBest Value | $2.00 | $6.00 | $40 |
Cost per 1,000 Requests
Short (500 tok)
$2.20
via mistral
Medium (2K tok)
$8.80
via mistral
Long (8K tok)
$28.00
via mistral
Performance Estimates
Throughput by GPU
VRAM Breakdown (B200 SXM, FP8)
Precision Impact
bf16
246.0 GB
weights/GPU
fp8
123.0 GB
weights/GPU
~280.0 tok/s
int4
61.5 GB
weights/GPU
Quality Benchmarks
Capabilities
Features
Supported Frameworks
Supported Precisions
Where to Deploy Mistral Large 2411
Self-Hosted Infrastructure
Similar Models
Mistral Large 2
123B params · dense
Quality: 75
from $2.50/M
Nemotron-3 Super 120B
120B params · dense
Quality: 84
from $2.40/M
DBRX Base
132B params · moe
Quality: 50
from $2.25/M
DBRX Instruct
132B params · moe
Quality: 50
from $1.20/M
Command A
111B params · dense
Quality: 81
from $10.00/M
Frequently Asked Questions
How much VRAM does Mistral Large 2411 need for inference?
Mistral Large 2411 requires approximately 246.0 GB of VRAM at BF16 precision, 123.0 GB at FP8, or 61.5 GB at INT4 quantization. Additional VRAM is needed for KV-cache (360448 bytes per token) and activations (~3.00 GB).
What is the best GPU for Mistral Large 2411?
The top recommended GPU for Mistral Large 2411 is the B200 SXM using FP8 precision. It achieves approximately 280.0 tokens/sec at an estimated cost of $4261/month ($5.79/M tokens). Score: 100/100.
How much does Mistral Large 2411 inference cost?
Mistral Large 2411 API inference starts from $2.00/M input tokens and $6.00/M output tokens. Self-hosted inference costs depend on your GPU configuration — use our ROI calculator for a detailed breakdown.