MiniMax M2.7
MiniMax · moe · 456B parameters · 1,048,576 context
Parameters
456B
Context Window
1024K tokens
Architecture
MoE
Best GPU
B200 NVL (pair)
Cheapest API
$2.80/M
Quality Score
82/100
Intelligence Brief
MiniMax M2.7 is a 456B parameter Mixture-of-Experts (0 experts, N/A active) model from MiniMax, featuring Grouped Query Attention (GQA) with 80 layers and 6,144 hidden dimensions. With a 1,048,576 token context window, it supports tools, vision, structured output, code, math, multilingual, reasoning. On standardized benchmarks, it achieves MMLU 88, HumanEval 68, GSM8K 92. The most cost-effective API deployment is via minimax at $2.80/M output tokens. For self-hosted inference, B200 NVL (pair) delivers optimal throughput at $19929/month.
Architecture Details
Memory Requirements
BF16 Weights
912.0 GB
FP8 Weights
456.0 GB
INT4 Weights
228.0 GB
Fits on (single GPU) — most practical first
Fits on (multi-GPU with Tensor Parallelism)
Multi-GPU configurations use Tensor Parallelism (TP) to split model layers across GPUs. Requires NVLink or NVSwitch interconnect for optimal performance.
GPU Compatibility Matrix
MiniMax M2.7 is compatible with 2% of GPU configurations across 41 GPUs at 3 precision levels.
GPU Recommendations
FP8 · 2 GPUs · tensorrt-llm
100/100
score
Throughput
280.0 tok/s
Latency (ITL)
3.6ms
Est. TTFT
1ms
Cost/Month
$19929
Cost/M Tokens
$27.08
FP8 · 4 GPUs · tensorrt-llm
98/100
score
Throughput
280.0 tok/s
Latency (ITL)
3.6ms
Est. TTFT
1ms
Cost/Month
$17044
Cost/M Tokens
$23.16
FP8 · 4 GPUs · tensorrt-llm
98/100
score
Throughput
280.0 tok/s
Latency (ITL)
3.6ms
Est. TTFT
1ms
Cost/Month
$17082
Cost/M Tokens
$23.21
Deployment Options
API Deployment
minimax
$2.80/M
output tokens
Single GPU
Requires multi-GPU setup (456 GB VRAM needed)
Multi-GPU
B200 NVL (pair) x2
280.0 tok/s
TP· $19929/mo
API Pricing Comparison
| Provider | Input $/M | Output $/M | Badges |
|---|---|---|---|
| minimax | $0.70 | $2.80 | Cheapest |
Cost Analysis
| Provider | Input $/M | Output $/M | ~Monthly Cost |
|---|---|---|---|
| minimaxBest Value | $0.70 | $2.80 | $18 |
Cost per 1,000 Requests
Short (500 tok)
$0.91
via minimax
Medium (2K tok)
$3.64
via minimax
Long (8K tok)
$11.20
via minimax
Performance Estimates
Throughput by GPU
VRAM Breakdown (B200 NVL (pair), FP8)
Precision Impact
bf16
456.0 GB
weights/GPU
fp8
228.0 GB
weights/GPU
~280.0 tok/s
int4
114.0 GB
weights/GPU
Quality Benchmarks
Capabilities
Features
Supported Frameworks
Supported Precisions
Where to Deploy MiniMax M2.7
Similar Models
MiniMax-Text-01
456B params · moe
Quality: 50
from $5.00/M
Snowflake Arctic 480B
480B params · moe
Quality: 50
from $1.50/M
Llama 3.1 405B
405B params · dense
Quality: 81
from $3.00/M
Llama 4 Maverick
400B params · moe
Quality: 84
from $1.80/M
Jamba 1.5 Large
398B params · hybrid
Quality: 50
from $8.00/M
Frequently Asked Questions
How much VRAM does MiniMax M2.7 need for inference?
MiniMax M2.7 requires approximately 912.0 GB of VRAM at BF16 precision, 456.0 GB at FP8, or 228.0 GB at INT4 quantization. Additional VRAM is needed for KV-cache (131072 bytes per token) and activations (~4.00 GB).
What is the best GPU for MiniMax M2.7?
The top recommended GPU for MiniMax M2.7 is the B200 NVL (pair) (x2) using FP8 precision. It achieves approximately 280.0 tokens/sec at an estimated cost of $19929/month ($27.08/M tokens). Score: 100/100.
How much does MiniMax M2.7 inference cost?
MiniMax M2.7 API inference starts from $0.70/M input tokens and $2.80/M output tokens. Self-hosted inference costs depend on your GPU configuration — use our ROI calculator for a detailed breakdown.