Alpamayo 1.5-10B
NVIDIA · dense · 10B parameters · 8,192 context
Parameters
10B
Context Window
8K tokens
Architecture
Dense
Best GPU
A100 40GB SXM
Intelligence Brief
Alpamayo 1.5-10B is a 10B parameter DENSE model from NVIDIA, featuring Grouped Query Attention (GQA) with 32 layers and 4,096 hidden dimensions. With a 8,192 token context window, it supports vision, reasoning. For self-hosted inference, A100 40GB SXM delivers optimal throughput at $807/month.
Architecture Details
Memory Requirements
BF16 Weights
20.0 GB
FP8 Weights
10.0 GB
INT4 Weights
5.0 GB
GPU Compatibility Matrix
Alpamayo 1.5-10B is compatible with 89% of GPU configurations across 41 GPUs at 3 precision levels.
GPU Recommendations
BF16 · 1 GPU · vllm
95/100
score
Throughput
419.8 tok/s
Latency (ITL)
2.4ms
Est. TTFT
0ms
Cost/Month
$807
Cost/M Tokens
$0.73
BF16 · 1 GPU · vllm
95/100
score
Throughput
483.8 tok/s
Latency (ITL)
2.1ms
Est. TTFT
0ms
Cost/Month
$845
Cost/M Tokens
$0.66
BF16 · 1 GPU · vllm
95/100
score
Throughput
419.8 tok/s
Latency (ITL)
2.4ms
Est. TTFT
0ms
Cost/Month
$655
Cost/M Tokens
$0.59
Deployment Options
API Deployment
No API pricing available
Single GPU
A100 40GB SXM
$807/mo
Min VRAM: 10 GB
Multi-GPU
A4000 x2
194.0 tok/s
TP· $323/mo
API Pricing Comparison
No API pricing data available for this model.
Performance Estimates
Throughput by GPU
VRAM Breakdown (A100 40GB SXM, BF16)
Precision Impact
bf16
20.0 GB
weights/GPU
~419.8 tok/s
fp8
10.0 GB
weights/GPU
int4
5.0 GB
weights/GPU
Capabilities
Features
Supported Frameworks
Supported Precisions
Where to Deploy Alpamayo 1.5-10B
Similar Models
Falcon 3 10B
10.3B params · dense
Quality: 50
GLM-4 9B
9.4B params · dense
Quality: 50
from $0.15/M
ChatGLM4 9B
9.4B params · dense
Quality: 50
SOLAR 10.7B
10.7B params · dense
Quality: 50
from $0.30/M
Gemma 2 9B
9.2B params · dense
Quality: 68
from $0.10/M
Frequently Asked Questions
How much VRAM does Alpamayo 1.5-10B need for inference?
Alpamayo 1.5-10B requires approximately 20.0 GB of VRAM at BF16 precision, 10.0 GB at FP8, or 5.0 GB at INT4 quantization. Additional VRAM is needed for KV-cache (131072 bytes per token) and activations (~1.00 GB).
What is the best GPU for Alpamayo 1.5-10B?
The top recommended GPU for Alpamayo 1.5-10B is the A100 40GB SXM using BF16 precision. It achieves approximately 419.8 tokens/sec at an estimated cost of $807/month ($0.73/M tokens). Score: 95/100.
How much does Alpamayo 1.5-10B inference cost?
Alpamayo 1.5-10B inference costs vary by provider and GPU setup. Use our calculator for detailed cost estimates across all providers.