MiniMax-Text-01
MiniMax · moe · 456B parameters · 1,048,576 context
MiniMax-Text-01 is a 456B parameter Mixture-of-Experts (MoE) model with 45.9B active parameters per forward pass from MiniMax, featuring a 1,048,576 token context window. With 32 experts and 2 active per token, it achieves strong parameter efficiency while maintaining competitive quality scores. Based on InferenceBench analysis, the optimal deployment configuration is the B200 NVL (pair) (x2) at FP8 precision, achieving approximately 280.0 tokens/second at $27.08/million tokens.
Architecture Details
Memory Requirements
BF16 Weights
912.0 GB
FP8 Weights
456.0 GB
INT4 Weights
228.0 GB
Fits on (single-node)
GPU Recommendations
FP8 · 2 GPUs · tensorrt-llm
100/100
score
Throughput
280.0 tok/s
Cost/Month
$19929
Cost/M Tokens
$27.08
FP8 · 4 GPUs · tensorrt-llm
98/100
score
Throughput
280.0 tok/s
Cost/Month
$17044
Cost/M Tokens
$23.16
FP8 · 4 GPUs · tensorrt-llm
98/100
score
Throughput
280.0 tok/s
Cost/Month
$17082
Cost/M Tokens
$23.21
API Pricing Comparison
| Provider | Input $/M | Output $/M | Badges |
|---|---|---|---|
| minimax | $1.00 | $5.00 | Cheapest |
Capabilities
Features
Supported Frameworks
Supported Precisions
Similar Models
Frequently Asked Questions
How much VRAM does MiniMax-Text-01 need for inference?
MiniMax-Text-01 requires approximately 912.0 GB of VRAM at BF16 precision, 456.0 GB at FP8, or 228.0 GB at INT4 quantization. Additional VRAM is needed for KV-cache (163840 bytes per token) and activations (~3.00 GB).
What is the best GPU for MiniMax-Text-01?
The top recommended GPU for MiniMax-Text-01 is the B200 NVL (pair) (x2) using FP8 precision. It achieves approximately 280.0 tokens/sec at an estimated cost of $19929/month ($27.08/M tokens). Score: 100/100.
How much does MiniMax-Text-01 inference cost?
MiniMax-Text-01 API inference starts from $1.00/M input tokens and $5.00/M output tokens. Self-hosted inference costs depend on your GPU configuration — use our ROI calculator for a detailed breakdown.