Updated minutes ago· Sources: GPU Pricing, API Token Pricing, Model Registry

Yi-Large

01.AI · moe · 102.6B parameters · 32,768 context

Quality

74.0

Calculate ROI →Compare with others Fine-Tune This Model →

Architecture Details

TypeMOE

Total Parameters102.6B

Active Parameters24B

Layers64

Hidden Dimension8,192

Attention Heads64

KV Heads8

Head Dimension128

Vocab Size64,000

Total Experts32

Active Experts4

Memory Requirements

BF16 Weights

205.2 GB

FP8 Weights

102.6 GB

INT4 Weights

51.3 GB

KV-Cache per Token262144 bytes

Activation Estimate2.50 GB

Fits on (single-node)

B200 SXM FP8B100 SXM FP8GB200 NVL72 (per GPU) FP8GB300 NVL72 (per GPU) FP8H200 SXM FP8H100 SXM INT4H100 PCIe INT4H100 NVL INT4

GPU Recommendations

B200 SXMoptimal

FP8 · 1 GPU · tensorrt-llm

100/100

score

Throughput

280.0 tok/s

Cost/Month

$4261

Cost/M Tokens

$5.79

Use this config →

B100 SXMoptimal

FP8 · 1 GPU · tensorrt-llm

100/100

score

Throughput

280.0 tok/s

Cost/Month

$4271

Cost/M Tokens

$5.80

Use this config →

GB200 NVL72 (per GPU)optimal

FP8 · 1 GPU · tensorrt-llm

100/100

score

Throughput

280.0 tok/s

Cost/Month

$6169

Cost/M Tokens

$8.38

Use this config →

API Pricing Comparison

Provider	Input $/M	Output $/M	Badges
01ai	$3.00	$3.00	Cheapest

Quality Benchmarks

MMLU

78.0

HumanEval

47.0

GSM8K

82.0

MT-Bench

80.0

Capabilities

Features

✓ Tool Use✗ Vision✓ Code✓ Math✗ Reasoning✓ Multilingual✓ Structured Output

Supported Frameworks

vllmsglang

Supported Precisions

BF16 (default)FP8INT4

Similar Models

Command R+

104B params · dense

Quality: 78

from $2.00/M

Inflection 3

100B params · dense

Quality: 74

from $15.00/M

YaLM 100B

100B params · dense

Quality: 50

Llama 4 Scout

109B params · moe

Quality: 76

from $0.30/M

Command A

111B params · dense

Quality: 81

from $10.00/M