Eagle 2 9B
NVIDIA · dense · 9B parameters · 8,192 context
Parameters
9B
Context Window
8K tokens
Architecture
Dense
Best GPU
RTX 5090
Intelligence Brief
Eagle 2 9B is a 9B parameter DENSE model from NVIDIA, featuring Grouped Query Attention (GQA) with 32 layers and 4,096 hidden dimensions. With a 8,192 token context window, it supports vision. For self-hosted inference, RTX 5090 delivers optimal throughput at $845/month.
Architecture Details
Memory Requirements
BF16 Weights
18.0 GB
FP8 Weights
9.0 GB
INT4 Weights
4.5 GB
GPU Compatibility Matrix
Eagle 2 9B is compatible with 90% of GPU configurations across 41 GPUs at 3 precision levels.
GPU Recommendations
BF16 · 1 GPU · vllm
95/100
score
Throughput
537.6 tok/s
Latency (ITL)
1.9ms
Est. TTFT
0ms
Cost/Month
$845
Cost/M Tokens
$0.60
BF16 · 1 GPU · vllm
95/100
score
Throughput
93.7 tok/s
Latency (ITL)
10.7ms
Est. TTFT
2ms
Cost/Month
$180
Cost/M Tokens
$0.73
BF16 · 1 GPU · vllm
95/100
score
Throughput
113.3 tok/s
Latency (ITL)
8.8ms
Est. TTFT
2ms
Cost/Month
$380
Cost/M Tokens
$1.28
Deployment Options
API Deployment
No API pricing available
Single GPU
RTX 5090
$845/mo
Min VRAM: 9 GB
Multi-GPU
A4000 x2
213.3 tok/s
TP· $323/mo
API Pricing Comparison
No API pricing data available for this model.
Performance Estimates
Throughput by GPU
VRAM Breakdown (RTX 5090, BF16)
Precision Impact
bf16
18.0 GB
weights/GPU
~537.6 tok/s
fp8
9.0 GB
weights/GPU
int4
4.5 GB
weights/GPU
Capabilities
Features
Supported Frameworks
Supported Precisions
Where to Deploy Eagle 2 9B
Self-Hosted Infrastructure
Similar Models
Eagle 2.5 8B
8B params · dense
Quality: 65
Yi 1.5 9B
8.83B params · dense
Quality: 62
from $0.20/M
Yi Coder 9B
8.8B params · dense
Quality: 50
Gemma 2 9B
9.2B params · dense
Quality: 68
from $0.10/M
GLM-4 9B
9.4B params · dense
Quality: 50
from $0.15/M
Frequently Asked Questions
How much VRAM does Eagle 2 9B need for inference?
Eagle 2 9B requires approximately 18.0 GB of VRAM at BF16 precision, 9.0 GB at FP8, or 4.5 GB at INT4 quantization. Additional VRAM is needed for KV-cache (131072 bytes per token) and activations (~0.50 GB).
What is the best GPU for Eagle 2 9B?
The top recommended GPU for Eagle 2 9B is the RTX 5090 using BF16 precision. It achieves approximately 537.6 tokens/sec at an estimated cost of $845/month ($0.60/M tokens). Score: 95/100.
How much does Eagle 2 9B inference cost?
Eagle 2 9B inference costs vary by provider and GPU setup. Use our calculator for detailed cost estimates across all providers.