YaLM 100B
Yandex · dense · 100B parameters · 2,048 context
Parameters
100B
Context Window
2K tokens
Architecture
Dense
Best GPU
B200 SXM
Intelligence Brief
YaLM 100B is a 100B parameter DENSE model from Yandex, featuring Multi-Head Attention (MHA) with 80 layers and 10,240 hidden dimensions. With a 2,048 token context window, it supports multilingual. For self-hosted inference, B200 SXM delivers optimal throughput at $4261/month.
Architecture Details
Memory Requirements
BF16 Weights
200.0 GB
FP8 Weights
100.0 GB
INT4 Weights
50.0 GB
GPU Compatibility Matrix
YaLM 100B is compatible with 21% of GPU configurations across 41 GPUs at 3 precision levels.
GPU Recommendations
FP8 · 1 GPU · tensorrt-llm
100/100
score
Throughput
280.0 tok/s
Latency (ITL)
3.6ms
Est. TTFT
1ms
Cost/Month
$4261
Cost/M Tokens
$5.79
FP8 · 1 GPU · tensorrt-llm
100/100
score
Throughput
280.0 tok/s
Latency (ITL)
3.6ms
Est. TTFT
1ms
Cost/Month
$4271
Cost/M Tokens
$5.80
FP8 · 1 GPU · tensorrt-llm
100/100
score
Throughput
280.0 tok/s
Latency (ITL)
3.6ms
Est. TTFT
1ms
Cost/Month
$6169
Cost/M Tokens
$8.38
Deployment Options
API Deployment
No API pricing available
Single GPU
B200 SXM
$4261/mo
Min VRAM: 100 GB
Multi-GPU
H100 SXM x2
280.0 tok/s
TP· $3587/mo
API Pricing Comparison
No API pricing data available for this model.
Performance Estimates
Throughput by GPU
VRAM Breakdown (B200 SXM, FP8)
Precision Impact
bf16
200.0 GB
weights/GPU
fp8
100.0 GB
weights/GPU
~280.0 tok/s
int4
50.0 GB
weights/GPU
Capabilities
Features
Supported Frameworks
Supported Precisions
Where to Deploy YaLM 100B
Self-Hosted Infrastructure
Similar Models
Inflection 3
100B params · dense
Quality: 74
from $15.00/M
Yi-Large
102.6B params · moe
Quality: 74
from $3.00/M
Command R+
104B params · dense
Quality: 68
from $2.00/M
Llama 4 Scout
109B params · moe
Quality: 73
from $0.30/M
Llama 3.2 90B Vision
90B params · dense
Quality: 84
from $0.90/M
Frequently Asked Questions
How much VRAM does YaLM 100B need for inference?
YaLM 100B requires approximately 200.0 GB of VRAM at BF16 precision, 100.0 GB at FP8, or 50.0 GB at INT4 quantization. Additional VRAM is needed for KV-cache (3276800 bytes per token) and activations (~3.50 GB).
What is the best GPU for YaLM 100B?
The top recommended GPU for YaLM 100B is the B200 SXM using FP8 precision. It achieves approximately 280.0 tokens/sec at an estimated cost of $4261/month ($5.79/M tokens). Score: 100/100.
How much does YaLM 100B inference cost?
YaLM 100B inference costs vary by provider and GPU setup. Use our calculator for detailed cost estimates across all providers.