Yi-Lightning
01.AI · moe · 200B parameters · 16,384 context
Yi-Lightning is a 200B parameter Mixture-of-Experts (MoE) model with 22B active parameters per forward pass from 01.AI, featuring a 16,384 token context window. With 32 experts and 4 active per token, it achieves strong parameter efficiency while maintaining competitive quality scores. Based on InferenceBench analysis, the optimal deployment configuration is the B200 SXM (x2) at FP8 precision, achieving approximately 280.0 tokens/second at $11.58/million tokens.
Architecture Details
Memory Requirements
BF16 Weights
400.0 GB
FP8 Weights
200.0 GB
INT4 Weights
100.0 GB
Fits on (single-node)
GPU Recommendations
FP8 · 2 GPUs · tensorrt-llm
100/100
score
Throughput
280.0 tok/s
Cost/Month
$8522
Cost/M Tokens
$11.58
FP8 · 2 GPUs · tensorrt-llm
100/100
score
Throughput
280.0 tok/s
Cost/Month
$8541
Cost/M Tokens
$11.61
FP8 · 2 GPUs · tensorrt-llm
100/100
score
Throughput
280.0 tok/s
Cost/Month
$12337
Cost/M Tokens
$16.77
API Pricing Comparison
| Provider | Input $/M | Output $/M | Badges |
|---|---|---|---|
| 01ai | $0.99 | $0.99 | Cheapest |
Capabilities
Features
Supported Frameworks
Supported Precisions
Similar Models
Frequently Asked Questions
How much VRAM does Yi-Lightning need for inference?
Yi-Lightning requires approximately 400.0 GB of VRAM at BF16 precision, 200.0 GB at FP8, or 100.0 GB at INT4 quantization. Additional VRAM is needed for KV-cache (131072 bytes per token) and activations (~2.00 GB).
What is the best GPU for Yi-Lightning?
The top recommended GPU for Yi-Lightning is the B200 SXM (x2) using FP8 precision. It achieves approximately 280.0 tokens/sec at an estimated cost of $8522/month ($11.58/M tokens). Score: 100/100.
How much does Yi-Lightning inference cost?
Yi-Lightning API inference starts from $0.99/M input tokens and $0.99/M output tokens. Self-hosted inference costs depend on your GPU configuration — use our ROI calculator for a detailed breakdown.